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ABSTRACT 


Traditionally,  database  management  systems  have  been 
presented  and  evaluated  with  the  assumption  of  perfect  informa¬ 
tion.  In  this  thesis,  the  problem  of  dealing  with  imperfection  in  a 
database  is  discussed  and  a  forma’  framework  for  a  general  treat¬ 
ment  of  imperfection  is  proposed.  Incompleteness  and  incon¬ 
sistencies  in  the  stored  values  are  the  two  manifestations  of  imper¬ 
fection  we  consider. 

In  order  to  have  a  clear,  well-defined  framework  and  to  illus¬ 
trate  our  concepts  with  specific  examples,  we  concentrate  on  one 
database  model  -  the  relational  model.  The  organization  of  data¬ 
bases.  the  semantics  of  operations  on  them  (queries  and 
modifications)  and  certain  semantic  rules  (functional  dependen¬ 
cies)  are  examined  in  detail  when  imperfect  information  is 
present.  Finally,  it  is  shown  that  our  work  has  an  important  appli¬ 
cation  to  traditional  relational  database  theory,  where  a  basic  and 
controversial  assumption  is  weakened  with  the  use  and  careful 


treatment  of  imperfection. 


1 


?■  #  . 
»  fTf-’ 


a  ' 

-r 


’^fbc\  tooutsj^oBrss  9iijt<i«4«b 

'J^  o  •  ^  * 

,  r 

lo*ti<»q  1o  noiiqmu»iuB  liiiiJ  rijiW  JliiUwIflvl  brui 

^  a 

A  ii  dl]v  gnilisab  )o  maZdoiq  scU  •tX^tdi  «lil}  ni  jiotl 

■j0t.  !j  Fr*i>4WE)  ^  id  ahcrtromnf*  ’amSol  0  &cr0  bavwo^b  ti  oit0iW0& 

-> , 

r-  -  ’ 

-/?qr..Ti  tijofpiijslqniv^rr!  Bi  nalioAt^qmi  I0  ici^m 

®  .  .  '.■V,!,:  ■ 

igq/nl  !o  Bnofia^Ba^iiiatii  oir.t  itdi  •"avfjBV  beioiti  «rli  rtl 


SlJ 


^11- 


.( 


Ji9bi*aoo  ®w  ao^iud 


-tail!  oi  hjitt  al^^viartxijyt  |>*nin^h4iew  ,^«£;5  p  »v«d  ©J  rtulno  al  |l 

■  -  .’  ,  A  t  ^ 

<HK)  ao  oJti“r^r'0'3r*o9j5w  .  ’liqmaxa  rlijTir  ei^aaob  too 


’’o  •tMU  nii'i  0dT  J0b«/ii  ihst0iS»iB'i  mti"*  labojcn  ssadadub 

•  -  •  •  S--I  '  4-  I  • 

•  ’  r  • 

bti«  09h'n;r)[»>  ao  >110^610^70  V>  bAJ  ,s9Md 

i  -  • 

nib/i-^q^b  Iflnoijocat}  iiUn  runiiao  jbnc  (»croIijBonilK?ffi 


ifiS 


Q 


lei  ♦Toii»mioioi  iisslioqiuf  ooilw  iJaiob  oi  bdoimaxo  010  («oj»  *'  '  j 

ik  qi  itiiJioqm*  06  t*d  jH<rw  luo  i^odi  nwod*  t<  i  .  :^Ja0«0ito 

J*-  ^ 

brtft  DttsA  «  010X1?:'  ,^‘foortl  oiodoiob  lonollol!!?!  ijix*otJd>x*’ii  oi  aolioo 


ACKNOWLEDGEMENTS 


Being  a  graduate  student  is  a  great  experience.  Four  exciting  years  some¬ 
times  having  disappointments,  other  times  having  fun.  The  biggest  thrill  though, 
is  when  everything  is  over  and  you  are  ready  to  acknowledge  your  respect  to  all 
those  who  helped  in  making  your  graduation  a  reality. 

First,  I  would  like  to  thank  my  supervisor,  Dennis  Tsichritzis.  He  stood  by 
me  as  an  exceptional  advisor  and,  very  importantly,  as  a  friend.  He  provided 
continuous  support,  sharp  ideas  and  valuable  suggestions. 

Fred  Lochovsky  is  the  person  .who  followed  my  research  . more  closely  than 
anybody  else.  He  helped  me .  enormously  in  technical  matters,  in  writing  style, 
and  in  detecting  the  typos  (numerous)  in  this  thesis.  In  addition,  Fred  was  my 
best  friend  during  my  graduate  tenure. 

I  also  thank  the  other  members  in  my  committee,  Ken  Sevcik^  C.C.  Gotlieb 
and  Derek  Cornell,,  for  their  valuable  suggestions  in  forming  the  final  version  of 
this  thesis.  I  am  grateful  for  their  interest  and  patience  in  spending  so  much 
time  with  me;  I  also  thank  John  Mylopoulos  for  our  stimulating  discussions.  My 
external  reader,  Catriel  Beeri,  deserves!  special  thanks. ;  In  addition  to  the  help 
he  provided  with  his  expertise  in  the  area,  he  had  to  make  a  long  and  adven¬ 
turous  trip  in  order  to  be  present  in  my  oral. 

My  colleague  and  friend,  Marc  Graham,  is  the  person  that  everybody  needs 
to  keep  him  going.  Our  everyday  conversations  and  arguments,  in  constant 
search  for  the  right  direction  for  both,  were  invaluable. 

Foremost,  I  would  like  to. thank  my  parents.  Their  love  and  continuous  sup¬ 
port,  even  though  1  was  far  away,  kept  me  in  it.  I  am  proud  to  be  their  son. 

I  can’t  stop  without  mentioning  all  the  special  heroes. 


Digitized  by  the  Internet  Archive 
in  2018  with  funding  from 
University  of  Toronto 


https://archive.org/details/technicalreportc123univ 


My  "buddies"  in  the  office,  Don  Batory,;  Bruce  Caller,  Barbara  Bell,  Chris 
Lengauer,  Ivor  Ladd  and  Ignacio  Casas-Raposo,  provided  the  human  aspect  in 
research.  They  were  even  patient  with  my  singing! 

Nobody  can  stay  alone  for  so  many  years.  I  take  the  opportunity  to  thank 
my  girlfriends,  at  different  times,  Cheri,  Lisaj  Vicky,  Mary,  Mary  again,  Meegan, 
Ann  and.  Jane.; 


Lastly,  I  would;  like  to  thank  UNIX  and  TROFF. 


ff  ♦ff' 

AhrfD  .(laiff  .tsU/yO  ^u^E^  m>Cl  ,ooflk}  »<li  oi  *%atll>fatfd‘'  vK 

.  .1  -*  ,,,  '> 

^  ^  m  riBffiitd  9di  b«ftivcnq  .otoqaH-tJ^jiQ  oioftnaf  bnu  Ja4>kr  wl  - 

y{m  difw  ineUAq  «»v«»  tnint  xiufT 

tr  ''S^j  .-' 

'  ,  s_  . 

T  4tt««x  OB  no)  sooM  y«i)a  ttBo'^o<f(yW 

■&■. 

)iV  ,/i^U  ,norf3  .^Bxnii  itvdVjBib  •Bimai'sflils  xai 

,  '■  '  k^..k 

,T?OHT  XIKU  :inA/fi  oi  t  «xli«ikJ  /  jj'S 


■...■  ^'  '  . 


*; 


ataB/f/  ci  srfti  Mtej 


,‘1 

•V 


* ;’  >  '.■f' ' 


jWcti,W‘j>iU  K'xfceja^'^totift 


t._  fik'Ufl  ■'  ' 


M 


-fetrtrpd 

t»nc, ■7 

■*»aatiiW7  '''i'  *'  ■. 

‘•t  IHfllBvitfiElRA '  j- 


.,;.,Vyilv.  ■  ■ 


f  V 

.»  »  .(,■ 


:  ■  1 T n  ^ !  d , 


'Vo 


■'  yi(%bJcef  Ftftnsi  csf ,  •.-_  ^  ^  , 

TO  MY  PARENTS; 


.j^ . 


7.,'  1; 


■  \ -*'>*■>■  J^'O  •i^jk- 

'  '  '  -’W 

■*  ■'':  /y-*-*  'if7Swi:i  '.  V'i'V'rSiJfe'' 

^  ilSSM 


i5w.ii"s4 

;vila 


.r 


.:!  (■ ;  '';■ 


>!T!^r^ 


■ 


i;w^  %  Si'iOigfTsi's!?  ■ 

.^-.v 

^7';.  ■  ■  ■  ■ 


'3t  -  - 


Table  of  Contents 


PART  I  INTRODUCTION 

Chapter  1  Introduction  2 

1.1  Statement  of  the  Problem  2 

1.2  Motivation  7 

1.2.1  Database  Translation  7 

1.2.2  Distributed  Databases  9 

1.2.3  Theory  of  Data  Dependencies  11 

1.3  Other  Approaches  to  the  Formal  Treatment  of  Null  Values  13 

1.3.1  Man)'’-valued  Logic  Approaches  13 

1.3.2  Approaches  in  the  Framework  of  Two-valued  Logic  19 

1.4  Summary  of  Our  Approach  21 


PART  n  OPERATIONS  IN  THE  PRESENCE  OF  IMPERFECTION 

Chapter  2  The  Framework  27 

2.1  The  Relational  Model  27 

2.2  Formal  Semantics  of  a  Database  29 

2.3  Imperfection  in  a  Relational  Database  31 

2.4  Extension  of  Functions  Between  Domains  36 


Chapter  3  Query  Evaluation  and  Modification  Operations  44 

3.1  Least  Extensions  of  Queries  44 

3.2  An  Algorithm  for  Query  Evsduation  49 

3.3  Modification  Operations  56 


PART  m  THEORY  OF  FUNCTIONAL  DEPENDENCIES  AND  DATABASE  DESIGN 

Chapter  4  Functional  Dependency  Extensions  63 

4.1  Introduction  63 

4.2  Functional  Dependencies  and  Their  Interpretation  64 

4.3  Functional  Dependencies  in  Relations  with  Null  Values  67 

4.4  Inference  Rules  for  Functional  Dependencies  72 

4.5  Satisfiability  for  a  Set  of  Functional  Dependencies  79 

4.6  Summary  of  Results  89 


Chapter  5  A  Weaker  Form  of  the  Universal  Relation  Assumption  91 

5.1  Introductory  Concepts  91 

5.2  The  Universal  Relation  Assumption  and  Database  Consistency  95 

5.3  A  Critique  of  the  UR-assumption  98 

5.4  Weakening  the  UR-assumption  101 

5.5  Testing  for  Database  Consistency  107 

5.6  Tree-Structured  Databases  116 


PART  IV  CONCLUSION 

Chapter  6  Concluding  Remarks  123 


References  128 

Appendix  133 


Table  of  Figures 


1.1  Example  of  a  DBTG-set  8 

1.2  Example  of  a  distributed  database  (l,  2,  and  3)  10-11 

2.1.  Examples  of  domain  extensions  (a  and  b)  34 

2.2  Cartesian  product  domain  extension  35 

2.3  Illustration  of  the  extension  of  negation  and  conjunction  37-38 

(1,  2,  3  and  4) 

3.1  Example  of  a  relation  instance  45 

3.2  Algorithm  for  query  evaluation  54 

4.1  Example  of  a  relation  scheme  vrith  functional  dependencies  66 

4.2  An  instance  of  a  relation  scheme  66 

4.3  Sound  inference  rules  for  FDs  66 

4.4  An  instance  of  a  relation  scheme  with  nulls  66 

4.5  Examples  of  FDs  with  null  values  70 

4.6  An  example  illustrating  that  transitivity  does  not  hold  79 

4.7  Algorithm  to  test  for  FD  satisfiability  84 

4.8  An  example  illustrating  the  importance  of  order  87 

in  applying  the  NS-r.ules 

5.1  A  database  instance  97 

5.2  A  database  instance  which  does  not  satisfy  CIP  100 

5.3  An  example  of  outer-join  105 

5.4  An  example  illustrating  that  outer-join  is  not  associative  105 

5.5  A  database  instance  where  one  outer-join  is  not  consistent  106 

5.6  Algorithm  for  the  localized  test  109 

5.7  Algorithm  for  the  localized  completion  110 

5.8  Algorithm  for  the  application  of  an  NS-rule  111 

5.9  Algorithm  for  consistency  testing  in  insertions  112 

5.10  An  example  illustrating  the  localized  test  (1,  2,  3  and  4)  113-115 

5.11  A  globally  consistent  database  instance  116 

5.12  An  example  of  an  instance  which  is  locally  consistent  117 

but  globally  inconsistent 


PART  I 


INTRODUCTION 


In  this  thesis  we  deal  with  the  general  problem  of  treating  imperfection  in  a 
database.  Incompleteness  and  inconsistencies  in  the  stored  values  are  the  two 
forms  of  imperfection  we  consider.  The  problem  is  discussed  in  detail  in 
chapter  1.  Motivation  is  given  for  a  formal  solution  of  this  problem.  It  is  demon¬ 
strated  that,  because  of  some  new  reasearch  directions  in  database  manage¬ 
ment,  the  "easiest"  method  of  dealing  with  the  problem,  i.e.,  trying  to  avoid  it,  is 
inappropriate.  Related  research  is  critically  discussed  in  depth.  The  last  sec¬ 
tion  of  this  chapter  constitutes  a  brief,  informal  presentation  of  the  results  and 


contributions  of  this  thesis. 


Chapter  1 


Introduction 


As  for  me,  all  I  know  is  that  I  know  nothing 

-  Socrates 


1.1.  Statement  of  the  problem 

The  database  user  believes  that  his  database  provides  a  fedthful  representa¬ 
tion  of  the  portion  of  the  real  world  that  it  models.  Unfortunately,  this  is  not 
always  the  case.  The  database  may  not  have  all  the  data  corresponding  to  the 
real  world  situtation  nor  may  its  current  contents  be  free  of  inconsistencies. 
The  general  problem  that  will  be  dealt  with  in  this  thesis  is:  "How  should  the 
database  be  organized  so  that  its  imperfection  does  not  hinder  its  usefulness  to 
the  user?" 

We  first  define  what  we  mean  by  imperfection  in  a  database  by  exploring  the 
notions  of  possibility  and  indeterminacy.  Consider  the  statement: 

"The  world  will  be  destroyed  by  the  turn  of  the  century." 

The  truth  value  of  this  statement  is  indeterminate.  Certainly  this  statement 
may  possibly  be  true  in  the  year  2000.  But,  the  statement’s  truth  value  cannot 
be  determined  now.  These  types  of  statements  have  created  some  confusion 
between  the  notions  of  possibility  and  indeterminacy,  although  the  two  notions 
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are  clearly  distinct.  For  instance  consider  the  statement: 

"Joe  Clark  was  elected  prime  minister  of  Canada,” 

This  statement  is  possibly  true  (since  it  is  actually  true).  On  the  other  hand,  we 
cannot  say  that  this  statement  is  indeterminate.  In  summary,  possibility 
implies  that  something  is  either  tr-ue  or  false.  By  contrast,  indeterminacy 
implies  that  something  is  neither  true  nor  false. 

A  database  is  a  model  of  a  portion  of  an  evolving  real  world.  Assume  that 
entities  are  modelled  in  terms  of  their  properties.  Here,  a  property  is  a  generic 
term  like  employee-type  or  marital- status.  What  we  store  in  a  database  is 
values  like  "typist"  or  "single"  that  correspond  to  particular  instances  of  proper¬ 
ties.  It  is  natural  that  on  many  occasions  the  database  will  not  correspond 
exactly  to  the  real  world  instance.  That  is,  it  will  not  have  all  the  values  it  needs 
for  an  accurate  correspondence. 

Consider  an  employee  who  may  in  reality  have  an  employee-type  property 
instance,  "manager",  but  this  fact  is  not  recorded  in  the  database  instance. 
There  could  be  several  reasons  for  this  situation  such  as  confidentiality, 
ignorance,  or  uncertainty.  The  fact  that  the  database  has  no  information  on  this 
property  of  the  employee  is  in  itself  information.  This  fact  is  usually  recorded 
by  a  special  value,  called  a  null  value.  This  null  value  may  be  interpreted  as 
being  any  possible  "typed"  value,  but  it  is  not  known  exactly  which  value.  The 
restriction  "typed"  refers  to  the  domain  of  values  that  the  property  instances 
can  take.  In  our  example,  the  employee  may  in  fact  be  a  manager.  However,  if 
a  null  value  is  recorded  as  his  employee-type,  then  it  is  also  possible  that  he  is  a 
janitor  or  a  typist  or  any  other  employee  type.  For  whatever  reason,  the  data¬ 
base  is  not  able  to  determine  which  value  applies  and  so  the  value  null  is 


recorded. 
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Returning  to  our  introductory  discussion  on  possibility  and  indeterminacy, 
it  is  clear  that  the  null  value  expresses  possibility.  From  now  on  we  will  call  this 
value  missing.  The  natural  question  at  this  point  is:  What  about  properties  that 
are  indeterminate  in  the  real  world?  Two  of  the  properties  of  our  familiar 
employee  are  marital-status  and  spouse*s-name.  The  values,  that  marital- 
status  may  take,  are  "single",  "divorced",  "separated"  or  "married".  Spouse’s- 
name  may  be  any  name.  Suppose  that  in  the  instance  of  the  real  world  we  are 
modelling,  the  employee  is  "single".  In  this  case,  we  can  not  give  spouse’s-name 
a  value,  since  then  we  will  have  a  very  clear  inconsistency  in  the  database.  The 
value  missing  does  not  help  since,  according  to  our  interpretation,  missing  can 
possibly  be  any  name  and  is  determined  in  the  real  world  that  the  database  is 
modelling.  Here  we  have  an  indeterminacy.  We  can  only  determine  the  actual 
value  of  this  property  in  another  world  instance  in  which  the  employee  is  not  sin¬ 
gle  any  more. 


The  information  we  want  to  record  in  such  a  case  is  the  fact  that  the 
employee  can  not  have  a  value  for  this  property.  The  special  value  which  is  used 
to  record  this  fact  is  usually  referred  to  as  non -applicable.  In  this  thesis  we 
take  the  approach  of  [Biskup  80]  and  treat  this  value  as  a  regular  value.  The 
non— applicable  value  is  introduced  in  each  domain  as  property-dependent.  For 
instance,  for  the  property  spouse’s-name  we  introduce  a  value  no— name  denot¬ 
ing  the  fact  that  an  employee  does  not  have  a  spouse.  Similarly,  for  the  pro¬ 
perty  phone-no  of  a  company’s  telephone  directory,  we  introduce  the  value  none 
denoting  the  fact  that  an  employee  does  not  have  a  telephone  connection. 

There  is  another  manifestation  of  indeterminacy  in  a  database  that  is  not 

captured  by  the  value  non— applicable.  Consider  the  case  where,  for  an 

/ 

employee,  information  exists  in  the  database  indicating  that  his  salary  is  both 
SlO.OOO  and  $11,000.  Such  a  situation  can  arise  when  many  users  are  permitted 
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to  enter  data,  all  of  which  are  assumed  to  be  equally  reliable.  In  a  more  realistic 
environment,  this  situation  may  frequently  occur  if  the  database  is  distributed 
and  the  distributed  data  must  be  combined  (e.g.  to  answer  a  query).  This  tj^pe 
of  indeterminacy  may  be  recorded  in  a  database  with  a  special  value  called  the 
inconsistent  or  nothing  value.  This  value  expresses  indeterminacy  in  that  the 
database  is  modelling  more  than  one  real  world  -  the  worlds  that  each  of  the 
users,  who  inserted  the  data,  perceive.  Since  all  users  should,  ideally,  perceive 
the  same  real  world,  this  situation  is,  in  effect,  a  design  or  an  operational  prob¬ 
lem.  Some  outside  guidance  is  required  since  the  database  can  not  determine 
who  is  correct.  A  tie-breaking  scheme  may  be  devised  (e.g.  a  "Grand"  user)  to 
resolve  the  inconsistencies.  In  this  case  no  inconsistent  values  appear  in  the 
database.  However,  we  are  concerned  with  cases  where  the  indeterminacy  does 
appear  in  a  database  and  we  must  deal  with  it. 

Treatment  of  inconsistent  values  may  vary  in  a  database.  One  way  of  treat¬ 
ing  inconsistencies  is  to  consider  them  as  partial-inconsistencies.  By  partial- 
inconsistencies  we  mean  special  values  which  have  a  set  of  regular  values  asso¬ 
ciated  with  them.  Consider  again  the  example  of  the  employee  with  the  two 
salaries.  The  inconsistency  of  the  salary  value  can  be  represented  with  an  sub¬ 
scripted  null  together  with  the  associated  set  "fSlO.OOO,  $11,000]".  Thus,  the 
statement: 


"The  employee  earns  $10,000" 

is  both  true  and  false.  Equivalently,  we  can  not  say  with  certainty  that  the 
statement  is  neither  true  nor  false.  On  the  other  hand,  the  statement: 


is  false. 


"The  employee  earns  $20,000" 


Another  way  of  treating  inconsistencies,  which  is  less  precise  than  the 
above,  is  not  to  allow  partial-inconsistencies.  The  rationale  behind  this 
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treatment  is  that  once  it  is  admitted  that  there  is  an  inconsistency,  Le.,  more 
them  one  value  is  used  for  the  same  property,  ail  the  values  are  suspect.  Hence, 
in  this  approach,  which  we  adopt,  both  of  the  above  statements  are  neither  true 
nor  false.  If  more  precision  is  required,  other  values  with  special  meaning  may 
be  introduced.'!'  The  two  interpretations  we  introduced,  missing  and  nothing,  are 
general  cases  of  all  other  manifestations  of  null. 

Summarizing,  we  consider  three  types  of  null  values  in  this  thesis.  The 
non  — applicable  value  is  treated  as  a  regular  value.  The  missing  value 
expresses  possibility.  It  is  not  determined  in  the  database,  but  may  be  deter¬ 
mined  in  the  modelled  real  world  instance.  Missing  arises  from  the  inability  of 
the  database  to  model  accurately  ^  the  aspects  of  the  real  world  instance.  The 
nothing  value  expresses  indeterminacy.  Indeterminacy,  not  only  in  the  data¬ 
base  but  also  in  the  real  world  instance.  Nothing  arises  from  attempts  of  the 
database  to  record  inconsistent  information  (model  more  than  one  real  world 
instance).  Data  bases  with  occurrences  of  missing  and  nothing  values  are  said 
to  contain  imperfect  information. 

Traditionally,  database  management  systems  have  been  presented  and 
evaluated  in  the  absence  of  imperfect  information.  In  this  thesis,  we  examine 
the  organization  and  behaviour  of  database  systems  in  the  presence  of  imperfec¬ 
tion.  In  order  to  have  a  clear,  well-defined  framework  and  to  illustrate  our  con¬ 
cepts  by  specific  examples,  we  concentrate  on  one  database  model  -  the  rela¬ 
tional  model  [Codd  70].  The  organization  of  databases,  the  semantics  of  opera¬ 
tions  on  them  (e.g.  queries,  insertions,  deletions  and  updates)  and  certain 
semantic  rules  (e.g.  functional  dependencies)  are  examined  in  detail  when 
imperfect  information  is  present. 

t  [ANSI  75]  lists  at  least  14  distinct  manifestations  of  niill.  Among  them:  not  valid  for  this  in- 
dividucil,  valid  but  does  not  exist  yet,  exists  but  not  permitted  to  be  stored,  exists  but  not 
known,  stored  but  not  available,  available  but  undergoing  change,  etc. 
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1.2.  Motivation 

Very  few  database  systems  provide  any  support  for  null  values.  The  data¬ 
base  designer  is  usually  very  careful  to  prevent  their  appearance  with  special¬ 
ized  schema  construction  and  strict  rules  governing  the  insertion,  deletion  and 
modification  of  data  values.  It  is  argued  though,  that  three  major  research 
trends,  namely,  database  translation/re  structuring,  distributed  databases  and 
the  theory  of  dependencies,  show  clearly  the  inappropriateness  of  this 
approach,  and  provide  strong  motivation  for  a  more  formal  treatment  of  null 
values.  We  briefly  discuss  the  three  areas  and  give  some  simple  examples. 

1.2.1.  Database  Translation 

Data  Translation  is  defined  in  [SDTG  77]  as  the  process  whereby  data 
created  by  one  computer  is  transformed  into  a  form  which  can  be  processed  by 
another  computer.  Application  migration  to  a  new  computer  system,  translation 
of  data  from  one  database  management  system  (dbms)  to  another,  and  database 
restructuring  are  all  examples  of  data  translation. 

The  appearance  of  explicit  null  values  in  a  database  is  sometimes 
suppressed  and  made  implicit  by  the  database  structure.  For  instance,  con¬ 
sider  a  DBTG  data  base  with  the  structure  depicted  in  Figure  1. 1.  t 


t  We  use  ‘’capital"  letters  for  entity  names  (records,  relations,  etc.),  "italics"  for  attribute 
neimes,  eind  "bold"  for  key-attribute  names. 
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PERSON 

name 

aqe 

owns 


SHARES 

id 

value 

Figure  1.1 

Assume  that  the  DBTG  set  owns  is  declared  as  optionEd-manued.  In  an 
instantiation,  we  will  have  no  links  if  a  person  does  not  own  shares  or  some 
shares  are  outstanding.  The  absence  of  a  link  indicates  an  implicit  null  value. 
With  the  current  research  interest  in  database  and  query  translations,  these 
implicit  null  values  may  become  explicit;  the  target  database  being  unable  to 
hide  them.  For  example,  a  straightforward  way  to  translate  this  network  data¬ 
base  to  a  relational  one  is  to  use  two  relations: 

PERSON(name,  age) 

SHARES  (id,  owner,  value) 

Assuming  that  some  of  the  shares  records  are  not  in  a  network  database  set 
occurrence,  we  have  null  vedues  appearing  in  the  owner  domain  of  the  relation 
SHARES.  These  nulls  may  take  the  interpretation  non-applicable  or  the  more 
conservative  one  missing.  Since  membership  is  manual  it  may  not  have  been 
established.  Of  course,  there  are  other  ways  to  translate  this  network  structure, 
e.g.  ynth  three  relations:  PERSON,  OUTSTANDING-SHARES,  O^^^ED-SHARES  or 
with  PERSON,  SHARES,.  OWNS,  but  other  problems  are  created  when  such 
approaches  are  chosen,  especially  in  modifications  [Vassiliou  and  Lochovsky  80]. 
For  instance,  a  change  of  ownership  would  require  an  update  of  the  non-key 
owner  attribute  value  in  our  two  relation  organization  -  which  is  conceptually 
expected.  On  the  other  hand,  having  three  relations  PERSON,  SHARES,  and 
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OWNS,  we  may  have  to  delete  a  tuple  from  OWNS  and  insert  another  tuple  (since 
the  attribute  would  be  part  of  the  key). 

Relations  that  have  explicit  null  values  may  also  be  used  in  resolving  the 
multiple  view  support  problem,  as  shown  in  [Zaniolo  77].  Given  a  multiple  view 
support  facility,  a  database  can  appear  in  a  different  form  for  different  users. 
Views  are  masking  of  various  parts  of  the  database,  thus  the  user  may  only  see 
the  information  within  his  authorization  domain.  In  translating  view  updates  to 
actual  database  updates,  null  values  are  introduced.  For  instance,  a  deletion  of 
a  tuple  in  the  view,  which  only  contains  a  subset  of  the  attributes  in  a  stored 
relation,  may  be  translated  to  a  change  of  the  relation  tuple's  values  to  missing. 

In  database  restructuring,  it  is  often  the  case  that  a  new  attribute  for  an 
entity,  considered  previously  as  non-important,  becomes  necessary.  Its  addition 
may  create  problems  if  not  all  values  are  known  or  if  in  some  cases  it  is  non- 
applicable  to  have  a  value  for  this  attribute.  For  instance,  consider  the  relation; 

CARS(make,  year,  engine-type) 

where  engine-type  is  either  "water-cooled"  or  "air-cooled".  Assume  that 
radiator-capacity  is  to  be  added  as  a  new  attribute  in  the  CARS  relation. 
Clearly,  for  air-cooled  cars,  a  value  for  the  radiator-capacity  is  non-applicable;  it 
creates  a  logical  inconsistency  in  the  database.  It  may  be  that  the  introduction 
of  the  radiator-capacity  in  the  relation  is  a  bad  design.  Whether  this  is  the  case 
or  not  should  not  affect  our  ability  to  handle  the  representation  since  it  may  not 
always  be  clear  that  we  have  a  bad  design. 


1.2.2.  Distributed  Data  Bases 

A  distributed  dbms  is  loosely  defined  as  a  system  managing  a  logically 
integrated  database  which  is  physically  distributed  over  several  distinct  com- 
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puting  facilities  (in  a  computer  network)  [Deppe  and  Fry  76].  A  common  charac¬ 
teristic  of  these  systems  is  that  dispersed  data  must  be  collected  at  one  com¬ 
puting  facility.  This  may  be  a  "global"  node  or  a  node  issuing  a  query.  The  data 
collected  may  be  all  the  data  necessary  to  answer  a  query,  or  just  the  replies  of 
locally  processed  queries,  or  even  a  combination  of  the  above.  When  the  distinct 
computing  facilities  store  parts  of  the  database  in  different  ways,  null  values  will 
naturally  appear  in  the  collection  of  dispersed  data. 

To  illustrate  the  point,  suppose  we  have  two  databases,  each  having  one 
relation.  An  instance  of  the  Detroit  database  is: 


CARS 

make 

year 

rrvpq 

gasguzzlers 

80 

15 

fastcars 

80 

26 

Figure  1.2.1 

An  instance  of  the  Windsor  database  is: 


CARS 

make 

year 

mv.Q 

of  doors 

turtles 

79 

55 

4 

gasguzzlers 

80 

15 

2 

Figure  1.2.2 

In  answer  to  a  query  about  cars,  the  two  databases  are  to  be  combined  into 
a  global  database.  We  assume  that  values  which  appear  in  only  one  database, 
like  the  #o/<ioors  of  gasguzzlers,  are  the  same  with  the  missing  values  for  the 
same  attribute  in  the  other  database.  An  instance  of  this  global  database  is: 
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CARS 

make 

year 

mvQ 

it  of  doors 

turtles 

79 

55 

4 

jgasfiuzzlers 

80 

15 

2 

fastcars 

80 

26 

- 

Figure  1.2.3 

Notice  that  we  have  a  "blank"  instead  of  a  value  for  the  number  of  doors 
that  the  "fastcars"  model  has.  This  is  not  because  the  car  does  not  have  doors, 
but  because  the  number  of  doors  is  unknown.  This  is  an  example  of  the  missing 
null  value. 

As  another  example  of  imperfect  information  in  a  distributed  database,  sup¬ 
pose  that  the  tuples  (neverstops,  79,  50)  and  (neverstops,  79,  52,  4)  appear  in 
the  Detroit  and  Windsor  databases,  respectively.  The  global  database  tuple  will 
have  an  inconsistency  in  the  mpg  attribute  value.  Two  values  will  appear  (50  and 
52).  This  results  in  the  nothing  value,  which  expresses  indeterminacy,  appear¬ 
ing  in  the  global  database. 


1.2.3.  Theory  of  Data  Dependencies 

The  theory  of  data  dependencies  ([Codd  72],  [Codd  70],  [Beeri  et  al  78], 
[Fagin  77],  [Fagin  79])  has  recently  attracted  many  researchers  from  the  data¬ 
base  and  theoretical  areas.  The  notion  of  dependencies,  which  is  purely  syntac¬ 
tic,  has  been  introduced  as  a  tool  to  capture  semantics  in  a  relational  database. 
The  goal  of  the  theory  is  the  design,  with  the  dependencies  as  guidelines,  of  a 
"good"  relational  database  schema  which  is  conceptually  meaningful  and  is  free 
of  certain  update  anomalies  [Date  77]. 

Data  dependencies  (the  semantic  rules)  are  defined  between  attributes  that 
belong  to  one  relation.  As  a  starting  point  then,  we  need  to  assume  that  the  por- 
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tion  of  the  real  world  can  be  modelled  with  one  relation.  This  assumption  is 
called  the  Universal  Relation  (instance)  assumption  (UR  assumption).  As  a 
consequence,  any  relation  in  the  database  should  be,  at  all  times,  the  projection 
of  a  UR  instance.  In  the  absence  of  null  values,  it  has  been  shown  in  [Honeyman 
et  al  80]  that  finding  a  fast  algorithm  which  tests  the  validity  of  the  UR  assump¬ 
tion  is  very  unlikely  (an  NP-complete  problem).  When  we  start  with  a  UR  and  we 
decompose  it  (according  to  the  dependencies),  it  is  trivially  true  that  the  UR 
assumption  is  valid.  But,  the  decomposed  relations  are  usually  further  modified, 
and  it  is  through  these  modifications  that  the  property  of  being  the  projection  of 
a  UR  instance  may  not  be  preserved. 

There  are  two  aspects  of  the  UR  assumption  we  want  to  explore.  The  first 
one  has  to  do  with  the  initial  UR  instance.  Our  claim  is  that  null  values,  of  the 
sort  we  have  introduced,  must  inevitably  appear.  In  the  UR  we  are  representing 
some  value-independent  facts.  Since  all  these  facts  must  appear  as  a  combina¬ 
tion  of  attribute  values  in  a  row  of  the  UR  instance,  the  requirement  of  having  all 
rows  filled  with  values  means  that  a  fact  would  imply  a  series  of  other  facts.  For 
example,  the  fact  that  an  apartment  has  an  occupant  is  independent  of  the  fact 
that  the  occupant  has  a  pet  and  both  are  independent  of  the  fact  that  there 
exists  an  apartment.  Therefore,  some  missing  values  may  appear  in  the  UR 
instance.  In  addition  non-applicable  values  will  possibly  appear  since,  by 
definition,  the  UR  is  a  bad  design  (all  attributes  must  appear  in  the  UR).  For 
instance,  both  the  attributes  marital- status  and  spouse’s-name  will  appear  in 
the  same  (the  only)  relation  which  models  a  company.  Therefore,  for  single 
employees  we  will  have  the  non-applicable  value  in  the  spouse’s-name  column. 
Of  course,  data  dependencies  are  defined  in  a  context  of  no  nulls.  In  order  to 
allow  for  null  values,  we  must  carefully  redefine  dependencies  together  with 
their  requirements  of  satisfiability  and  inference  rules. 
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The  second  aspect  of  the  UR-assumption,  has  to  do  with  the  modifiability  of 
the  decomposed  relations.  A  decomposed  relation  instance  will  be  a  projection 
of  a  UR  instance  only  if  some  null  values  are  inserted/deleted  from  the  UR 
instance  at  the  time  that  the  decomposed  relation  is  modified.  Good  guidelines 
for  inserting/deleting  nulls  would  provide  a  fast  algorithm  to  test  the  UR 
assumption  -  as  was  pointed  out  in  [Honeyman  et  al  80].  Since  the  UR- 
assumption  can  be  considered  as  a  constraint  on  the  database,  the  test  for  its 
validity  is  very  important  and  for  practical  reasons  should  be  done  efficiently. 


1.3.  Other  Approaches  to  the  Formal  Treatment  of  Null  Values 

We  distinguish  two  basic  type  of  approaches.  Approaches  that  use  many¬ 
valued  logics  are  illustrated  by  Codd's  work  ([Codd  75],  [Codd  79]).  In  addition, 
there  are  approaches  that  remain  in  the  framework  of  the  familiar  two-valued 
logic,  exemplified  by  Lipski’s  work  [Lipski  79].  In  the  latter  category  we  will  dis¬ 
cuss  research  which  concentrates  on  specific  aspects  of  the  problem.  For  exam¬ 
ple,  we  will  survey  work  on  certain  constraints  with  null  values  [Honeyman  80], 
[Maier  80]]  [Goldstein  80],  relational  operators  [Biskup  80],  and  multivalued 
dependencies  with  nulls  [Lien  79],  [Walker  80]. 

1.3.1.  Many-valued.  Logic  Approaches 

When  the  database  contains  imperfect  data  then  the  regular  two-valued 
logic  system  we  use  in  extracting  information  from  the  database  does  not  seem 
adequate.  For  instance,  consider  a  relation  about  employees  which  among  other 
things  records  their  salary.  The  query  "Does  employee  Smith  earn  SlOOOO?"  is 
answered  by  the  database  system  with  either  yes  or  no  when  a  regular  salary 
value  is  recorded  in  the  tuple  corresponding  to  Smith.  If  the  value  recorded  is 
missing,  then  the  system  can  answer  neither  yes  nor  no  with  certainty.  It  may 
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answer  unknown  or  maybe.  Similarly,  if  the  value  recorded  as  a  salary  value  is 
nothing  then  the  system  may  give  an  answer  like  there  is  an  inconsistency  or 
yes  and  no.  The  above  discussion  leads  naturally  to  the  consideration  of  an  n- 
truth  valued  (7i-valued)  logic  system  as  a  candidate  for  the  formal  treatment  of 
null  values.  In  this  section  we  present  and  criticize  approaches  based  on  n- 
valued  systems. 

An  obvious  approach  to  overcome  the  inadequacies  of  the  very  familiar 
two-valued  logic  is  the  introduction  of  a  third  truth  value.  This  value  is  neither 
true  nor  false  (e.g  the  truth  value  unknown).  Lukasiewitz,  Bochvar,  Kleene,  and 
the  Standard  Sequence  are  the  most  important  formulations  of  such  logic 
[Reshner  69].  A  three-vedued  system  is  characterized  as  normal  if  the  three¬ 
valued  truth  tables  agree  with  the  usual  two-valued  ones  when  only  true  and 
false  are  involved.  An  important  fact,  true  for  all  normal  systems,  is  that  any 
three-valued  tautology  (a  well-formed  formula,  or  wff,  that  invariably  takes  on 
the  truth  value  true  for  all  possible  assignments  of  its  constituent  propositional 
variables)  is  also  a  tautology  in  two-valued  logic.  The  demonstration  of  this  fact 
is  very  simple:  If  the  wff  uniformly  takes  on  true  whenever  its  variables  assume 
values  from  the  list  time,  I,  false  (where  /  is  the  other  truth  value),  then  this  wff 
will  also  take  i me  whenever  its  variables  take  values  from  the  abbreviated  list 
true,  false.  Therefore,  the  wff  must  also  be  a  two-valued  tautology.  The  con¬ 
verse  though,  as  we  mil  show,  is  not  the  case. 

Logic  systems  are  characterized  as  truth-functional  if  the  following  princi¬ 
ple  is  always  satisfied.  For  any  formula  F{p\.p2,....Pn)  expressed  in  terms  of  its 
individual  terms,  and  an  interpretation  V  (e.g.  assignment  to  truth-values):  t 

V(F(pi,p2,....Pn))  =  FiV(pi),V{p2).---.V{Pn)) 


t  If  we  want  to  be  more  precise  with  oxir  notation,  we  must  use  a  different  F  on  the  right- 
hand  side.  Notice  that  on  the  left-hand  aide,  F  is  a  function  on  formal  symbols,  while  on  the 
right-hand  side  it  is  a  semantic  function  (defined  on  truth  vedues). 
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For  instance,  with  F  being  the  function  \/,  F(pi\/p2)  =  ^(Pi)v  F(p2).  Infor¬ 
mally,  the  truth-functionality  principle,  which  is  attributed  to  Frege  [Boas  and 
Janssen  79],  says  that  a  statement  composed  of  several  substatements  has  the 
same  interpretation  as  the  logical  composition  of  the  interpretations  of  all  its 
substatements. 

A  formal  treatment  of  the  missing  value  has  been  given  for  the  relational 
model  in  [Codd  75].  A  three-valued  logic  is  adopted  for  use  in  extracting  data 
from  databases  with  null  values.  The  whole  approach  is  based  on  a  principle, 
called  "null  substitution  principle",  that  gives  conditions  under  which  a  truth¬ 
valued  expression  takes  on  the  truth  value  unknown.  This  principle  is  con¬ 
sistent  with  the  truth-functional,  three-valued  logic  of  Kleene.  We  quote  the 
principle  from  [Codd  75]: 

"...A  truth-valued  expression,  involving  the  data  value  m'issing,  has  the  truth 

value  unknown  if  and  only  if  both  of  the  following  conditions  hold: 

a)  Each  occurrence  of  missing  can  be  replaced  by  a  non-null  value  (possi¬ 
bly  a  distinct  one  for  every  occurrence)  so  as  to  yield  the  value  true  for 
the  expression; 

b)  Each  occurrence  of  missing  in  the  expression  can  be  replaced  by  a 
non-null  value  (possibly  a  distinct  one  for  each  occurrence)  so  as  to 
yield  the  value  false  for  the  expression..." 

Codd  further  applies  the  null  substitution  principle  to  equality/inequality 
testing,  set  inclusion,  set  membership,  etc.  The  relational  operators  are  also 
reexamined  for  the  possibility  of  null  values  in  relations.  Generally,  a  relational 
query  (either  expressed  in  the  relational  algebra  or  with  predicate  calculus  for¬ 
mulas)  has  two  results.  A  true-result  (all  tuples  for  which  the  specified  query 
condition  evaluates  to  true  )  and  a  maybe-result  (the  condition  evaluates  to 
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unknown  ). 

A  data  model  which  is  by  itself  incomplete  is  expected  to,  at  least,  be  pre¬ 
cise  for  what  it  can  describe.  It  has  to  give  non  misleading  replies  when  queried; 
no  more  nor  less  than  it  knows.  There  are  some  cases  where  the  null  substitu¬ 
tion  principle  is  not  sufficient  as  the  model's  tool  for  a  precise  description  of  the 
situation  as  was  first  observed  In  [Grant  77].  The  use  of  the  principle  implies 
that  familiar  identities  (tautologies,  theorems  or  provable  formulas)  will  fail  to 
hold.  For  instance,  consider  the  relation: 

EMPLOYEE(name,  dept,  age,  marital- status) 
and  the  relational  calculus-like  query: 

LIST  EMPLOYEE.name  WHERE 

(EMPLOYEE. agre>50)  OR  (EIMPL0YEE.a£re^50) 

When  the  employee’s  age  is  unknown,  the  name  returned  by  the  query  will 
be  in  the  maybe-result.  But  the  qualification  condition  is  a  tautology  (all  possi¬ 
ble  ages  are  considered)  and  the  query  is  expected  to  list  all  employee  names  in 
the  true-result. 

Consider  now  the  relation: 


CARS 

make 

year 

mp.gr 

^of doors 

turtles 

79 

55 

4 

gasguzzler 

90 

15 

2 

fastcars 

80 

26 

- 

and  the  query:  "Return  all  the  2-door,  80  models,  and,  all  the  4-door  models  with  ‘ 
miles  per  gallon  consumption  more  than  20",  expressed  in  a  relational  calculus¬ 
like  language: 
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LIST  CARS  WHERE 

((CARS.year=80)  AND  (CARS.#o/doors=2)) 

OR  ((CARS.7npsf>20)  AND  (CARS.^ofdoors=A)) 

Its  qualification  condition  can  be  written  as  (pA<l)\/(TA'^g)  where  p,  q,  r  are 
the  simple  predicates:  p=(CARS.year=80),  g=(CARS.j^o/doors=2),  and, 
r=(CARS.m7}sr>20).  Assume  that  the  domain  of  §0 f  doors  is  exactly  f2.  4^.  If  _p 
and  r  have  equal  truth  values,  then  the  value  of  the  formula  is  independent  of 
the  value  of  q  and  is  equal  to  the  value  of  p.  This  fact  is  expressed  by  the 
theorem  (p—r)  ->  ((pAq)\/(TA-'fl)—p)  where  stands  for  '’implies”.  If  q  is  unk- 
nown  the  theorem  does  not  hold  in  the  three-valued  logic  used  by  Codd.  Hence, 
the  query  will  evaluate  to  unknown  for  the  tuple  (fastcars,  80,  26,  missing  ).  But 
a  car  is  either  2-door  or  4-door.  If  it  is  2-door  the  first  term  of  the  OR  condition 
evaluates  to  true  for  this  tuple,  and  if  it  is  4-door,  the  second  term  evaluates  to 
true.  Therefore,  the  tuple  should  he  in  the  true-result  of  the  query.  Notice  that, 
in  our  intuitive  considerations,  the  truth-functionality  principle  does  not  hold. 
We  did  not  establish  the  truth  value  of  the  query  by  merely  considering  the  truth 
vedues  of  each  term.  The  truth  value  of  q  is  not  independent  of  the  truth  values 
of  p  and  r. 

We  have  shown  that  the  use  of  a  particular  three-valued  logic  system  to 
treat  formally  missing  values  in  databases  is  not  appropriate  if  we  wish  to 
preserve  tautologies.  We  now  show  that  any  n-valued  logic  system  which  is 
truth-functional  and  has  unknown  as  one  of  its  truth  values  is  still  not  appropri¬ 
ate  for  our  purposes.  For  a  more  detailed  proof  the  reader  is  referred  to 
[Levesque  79],  We  do  not  give  explicitly  the  truth-tables  of  the  n-valued  system, 
but  at  least  two  restrictions  are  imposed  which  are  obviously  satisfied  by  every 
truth  table  of  any  normal  system.  If  two  primitive  terms  have  equal  truth 
values,  then  their  negations  also  have  equal  truth  values.  In  addition,  we  can 
substitute  a  primitive  term  with  another  term,  equal  in  truth  value,  in  a  simple 
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"or"  formula. 

In  a  system  where  the  unknown  truth  value  appears,  a  formula  can  only  be 
known  to  have  a  particular  truth  value.  Therefore,  being  precise  in  our  nota¬ 
tion, 

The  formula  Q  is  known  to  be  true  iff  V(Q)  is  equal  to  known-true 

where  known-true  is  a  value  denoting  true  in  the  system.  For  any  formula  Q, 
the  formula  (Q\/-'Q)  is  known  to  be  true  (a  tautology).  Suppose  now  that  we  have 
a  database  with  no  data  in  it.  If  we  issue  a  yes-no  query  as  a  formula  Q  on  this 
database,  then  the  only  time  the  answer  is  "yes"  is  when  the  query  is  a  tautol¬ 
ogy.  Hence, 

V(Q)  ■=  known-true  iff  Q  is  a  tautology 

Now,  we  can  always  find  distinct  primitive  formulas  p,  q  such  that  F(p)=7(q). 
For  instance,  both  of  them  are  interpreted  as  unknown  or  known  to  he  false. 
From  the  restriction  imposed  on  the  truth  tables  V(p\/-‘q)=V(p\/-’p).  But  the 
second  part  is  a  tautology,  hence  its  truth  value  is  known-true.  On  the  other 
hand,  the  first  term  is  not  a  tautology  and  cannot  have  its  truth  value  equal  to 
known-true.  We  have  here  a  clear  contradiction. 

We  have  demonstrated  that  if  the  unknown  truth  value  is  in  the  n-valued 
system,  then  no  matter  what  the  truth  value  tables  of  the  system  are  the 
truth-functionedity  principle  cannot  be  applied.  There  are  some  purely  syntac¬ 
tic  properties  of  the  data  language  which  are  independent  of  the  data  in  the 
database.  Therefore,  tautological  queries  will  be  true  or  known  to  he  true  no 
matter  what  the  factual  information  is. 

Next,  consider  the  case  of  nothing  which  expresses  indeterminacy,  [Kro- 
lokoski  79]  shows  that  the  usual  three-valued  logic  system  cannot  be,  at  least 
plausibly,  interpreted  as  a  system  in  which  truths  about  propositions  which  are 
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indeterminate  can  be  deduced.  The  proof  of  this  statement  is  based  on  the  fol¬ 
lowing  argument.  It  must  be  the  case  that  an  indeterminacy  operator  is  either  a 
primitive  operator  of  the  system  or  else  is  able  to  be  defined  in  the  three-valued 
logic  system.  Otherwise,  it  will  not  be  possible  to  express  and  deduce  truths 
about  determinate  and  indeterminate  propositions  in  the  system.  But,  it  is 
shown  that  the  operator  of  indeterminacy  is  neither  a  primitive  (examining  all 
the  primitives)  nor  can  it  be  defined  from  primitives  (by  attempting  to  define  all 
possible  operators  satisfying  certain  plausible  criteria). 

In  this  section  we  discussed  Codd's  approach  for  the  formal  treatment  of 
null  values.  Tautologies  are  not  preserved  when  this  appoach  is  followed  and  this 
is  because  of  the  truth-functionality  requirement.  We  generalized  the  above  for 
any  7i-valued  truth-functioned  logic.  Finally,  we  mentioned  the  result  in  [Kro- 
lokoski  79]  which  states  that  three-valued  logic,  where  the  third  value  expresses 
indeterminacy,  cannot  be  plausibly  used. 

1.3.2.  Approaches  in  the  Framework  of  Two-valued  Logic 

A  general  theory  for  the  treatment  of  incompleteness  appears  in  [Lipski 
79].  The  approach  is  not  primarily  concerned  with  traditional  database  manage¬ 
ment  systems.  An  information  system  is  presented  and  the  relational  model  of 
data  is  then  considered  as  a  special  case.  A  key  feature  in  Biskup’s  work  is  the 
possibility  of  having  partial  information  about  the  value  which  the  missing  null 
value  represents.  That  is,  nulls  may  take  values  from  subsets  of  the  attribute 
domains.  Lipski  assumes  two  interpretations  for  system-user  interaction.  In 
what  he  calls  the  external  interpretation  the  queries  refer  directly  to  the  real 
world,  modelled  in  an  incomplete  way  by  the  system.  On  the  other  hand,  in  the 
internal  interpretation  queries  refer  to  the  system’s  information  about  the  real 
world.  Basically,  the  system  is  able  to  recognize  tautological  queries  when  the 
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external  interpretation  is  used.  One  basic  drawback  in  this  approach  is  the  high 
computational  complexity  of  query  evaluations,  especially  for  the  internal 
interpretation.  Furthermore,  we  find  no  convincing  arguments  for  the  impor¬ 
tance  and  usability  in  database  management  applications  of  the  internal 
interpretation.  Lipski  treats  only  the  missing  null  and  is  concerned  with 
queries. 

[Mylopoulos  and  Wong  80]  introduce  the  null  values  as  special  objects  in  the 
TAXIS  data  model.  They  place  very  restrictive  limits  on  the  nature  of  queries 
that  may  be  asked  on  these  special  objects  and  on  their  semantic  interpreta¬ 
tion.  By  placing  these  limits  on  the  types  of  partial  knowledge  allowed,  they  are 
able  to  give  very  formal  and  precise  semantics  to  these  special  objects  and  their 
interaction  with  other  TAXIS  objects. 

Another  approach  for  the  formal  treatment  of  imperfection  is  presented  in 
[Biskup  80].  Two  null  values  are  formally  treated.  The  missing  null,  called 
existential  in  this  work,  and  a  new  null  called  universal.  The  universal  null  does 
not  have  well-defined  semantics  and  is  only  used  for  theoretical  purposes.  The 
approach  follows  many  of  the  ideas  in  [Codd  79]  but  remains  in  the  framework  of 
two-valued  logic.  Only  queries  are  considered  and  the  claim  is  that  query 
evaluation  is  very  economical.  Regrettably,  we  could  not  find  any  justification 
for  this  claim  in  [Biskup  80].  In  addition,  we  object  to  the  use  of  concepts  (e.g. 
universal  null  value)  for  which  no  pragmatic  example  is  given  and  which  are 
present  only  for  theoretical  purposes. 

We  now  discuss  some  recent  contributions  on  vsirious  aspects  of  the  general 
problem  of  treating  nulls  in  relational  databases.  [Lien  79]  looked  at  mul¬ 
tivalued  dependencies  in  the  presence  of  null  values  which  are  left  uninter¬ 
preted.  Related  to  Lien's  work  is  that  of  [Walker  80].  Walker  considers  a  univer- 

V 

sal  relation  model  (i.e.  a  single  table)  as  the  alternative  to  decomposed  rela- 
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tional  databases.  A  single  table  with  null  entries,  together  with  a  set  of  depen¬ 
dencies,  including  both  functional  and  multivalued  dependencies,  can  take  two 
forms.  In  one  form  the  information  is  held  implicitly  (i.e.  no  attempt  is  made  to 
determine  the  actual  value  which  a  null  in  the  table  represents).  In  the  other 
form  the  information  is  held  explicitly’,  the  dependencies  are  used  to  eliminate 
as  many  nulls  as  possible.  There  is  a  space-time  trade-off  between  the  two 
forms.  Less  space  is  needed  in  the  implicit  information  form,  but  heavy  compu¬ 
tation  is  required  for  most  retrievals.  The  method  of  retrievals  is  not  mentioned 
in  [Walker  80]. 

[Maier  80],  [Goldstein  80]  and  [Honeyman  80]  use  null  values  in  attempts  to 
resolve  the  problems  which  the  universal  relation  assumption  in  relational  data¬ 
base  design  introduces,  [Sciore  80]  and  [Korth  and  UUman  80]  leave  the  null 
values  uninterpreted.  Their  work  concentrates  on  the  behaviour  of  a  universal 
relation  model  under  the  requirements  of  dependency  satisfaction. 

Finally,  we  mention  Grant’s  proposal  [Grant  77]  which  gives  the  intuitive 
foundation  of  our  work.  Grant  recognizes  that  the  three  valued  logic  arising  in 
connection  with  the  missing  vedue  is  not  truth-functional.  A  method  is  proposed 
for  deciding  whether  a  tuple  with  a  missing  null  is  in  the  result  of  a  query  which 
essentially  involves  substituting  for  the  null  in  all  possible  ways.  This  short 
paper  is  very  informal  and  is  not  concerned  with  efficiency  in  query  evaluation. 


1.4.  Summary  of  our  Approach 

In  this  thesis  we  deal  with  imperfection  in  databases  in  the  following  way. 
Inconsistencies  of  the  type  that  the  null  value  nothing  represents  are  only 
detected  as  such.  No  attempt  is  made  to  resolve  these  inconsistencies.  Thus, 
when  we  evaluate  database  queries  on  nothing  values,  we  report  the  incon¬ 
sistency.  On  the  other  hand,  our  treatment  of  the  missing  null  value  is  more 
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sophisticated.  Informally,  a  database  with  missing  null  values  is  in  an  incom¬ 
plete  state  and  any  substitution  of  these  nulls  with  regular  values  completes  the 
state.  A  completed  state  represents  a  possible  configuration  of  the  database  if 
all  information  was  available.  Our  approach  requires  the  consideration  of  ail 
possible  completions  of  the  database  before  any  database  operation  (e.g.  query, 
modification,  enforcement  of  semantic  integrity  constraints)  can  be  performed. 
The  operation  is  then  performed  on  all  completions.  If  the  result  of  the  opera¬ 
tion  is  the  same  on  all  completions,  then  we  may  safely  conclude  that  the  data¬ 
base  imperfection  is  not  important.  On  the  other  hand,  there  may  be  cases 
where  the  result  of  the  operation  is  different  on  different  completions.  In  such 
cases,  the  answer  (result)  to  the  operation  is  "unkown"  or  "maybe”  or  "can’t 
tell",  since  it  is  not  known  which  is  the  appropriate  completion.  Note  that  in  this 
context  an  inconsistency  {nothing  null  value)  will  remain  in  all  completions.  We 
have  a  formal  and  natural  framework  to  capture  our  intuition.  Qualitative 
approximations,  lattices  and  least  extensions  in  the  mathematical  theory  of 
computation  [Scott  71]  serve  for  this  purpose. 

There  eire  of  course  several  problems.  We  cannot  work  with  completions  in 
practice,  simply  because  there  are  too  many  of  them.  However,  we  are  able  to 
solve  this  problem,  in  most  cases,  by  methods,  which  are  equivalent  to  the 
methods  suggested  by  the  basic  definitions,  but  do  not  require  substitution  of 
null  values  (completions). 

This  thesis  is  logically  divided  in  four  parts.  In  the  first  part,  which  is  the 
present  chapter,  we  introduce  the  problem  and  present  motivation  for  its  formal 
solution.  In  addition,  we  discuss  other  approaches  towards  the  solution  of  the 
problem. 

In  the  second  part  we  present  the  query  and  modification  operations  on  a 
database  with  imperfect  information.  In  order  to  have  a  clear,  well-defined 
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framework  and  to  illustrate  our  concepts  with  specific  examples,  we  concentrate 
on  one  database  model  -  the  relational  model.  Chapter  2  presents  our  formal 
framework.  We  start  with  introductions  to  the  relational  model  of  data,  to  for¬ 
mal  database  semantics  and  to  the  basic  results  of  the  mathematical  theory  of 
computation.  Imperfection  is  then  considered  in  a  relational  database.  The  null 
values  missing  and  nothing  are  embedded  in  database  domains  having  a  special 
relationship  with  the  other  regular  values.  Missing  is  an  approximation  to  all 
regular  values  while  the  latter  all  approximate  nothing.  The  informal  develop¬ 
ment  of  the  general  function  (operation)  extension  rule  is  then  presented.  The 
rule  is  formalized  and  justified  semanticedly  and  syntactically  in  the  framework 
of  the  mathematical  theory  of  computation.  It  is  called  least  extension  rule  in 
the  theory.  Among  the  interesting  properties  of  this  rule,  the  most  important  is 
the  fact  that  the  rule  does  not  apply  smoothly  to  function  compositions.  That  is, 
the  least  extension  of  the  composition  of  several  functions  is  not  necessarily  the 
same  as  the  composition  of  the  least  extensions  of  these  functions.  As  a  result, 
the  truth-functionality  principle  can  not  be  applied  in  our  context. 

In  chapter  3,  queries  are  considered  as  functions  on  a  database  and  the 
least  extension  rule  is  applied  so  that  they  can  be  defined  on  the  null  values.  We 
demonstrate  how  the  tautology  problem  in  Codd’s  approach  is  solved  with  our 
interpretation.  Furthermore,  we  present  an  alternative  method  for  query 
evaluation  which  is,  at  worst,  exponential  on  the  number  of  simple  terms  of  the 
query.  This  is  contrasted  to  the  direct  application  of  the  least  extension  rule 
which  requires  time  exponential  on  the  size  of  attribute  domains.  It  is  shown 
that  very  little  hope  exists  for  a  polynomial  worst  behaviour  query  evaluation 
algorithm,  since  the  query  evaluation  problem  is  co-NP  complete.  We  also  dis¬ 
cuss  modification  operations  in  the  presence  of  imperfection.  We  present,  at  a 
high  level  of  abstraction,  the  semantics  of  tuple  insertion  and  deletion. 
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Chapters  4  eind  5  constitute  the  third  part  of  the  thesis.  In  chapter  4  we 
consider  the  interpretation  of  functional  dependencies  (FDs)  on  relations  with 
null  values.  Since  we  are  only  interested  in  consistent  databases,  there  is  only 
one  manifestation  of  null  in  this  context;  the  missing  null.  Any  appearance  of 
the  nothing  null  value  would  result  in  an  inconsistency.  A  relation  instance 
where  all  FDs  are  satisfied  is  called  consistent.  When  we  test  for  FD  satisfiability 
on  all  completions  of  a  relation  with  nulls  and  we  always  get  the  same  result  (i.e. 
always  true  or  always  false),  then  the  presence  of  nulls  in  the  relation  is  not 
important  for  its  consistency  test.  This  is  a  very  strong  and  restrictive  notion  of 
consistency.  Alternatively,  a  weaker  notion  of  consistency  requires  the 
existence  of  at  least  one  completion  where  all  FDs  tested  without  nulls  are 
satisfied. 

We  again  face  the  problem  of  practicality  in  dealing  with  completions.  We 
are  able  to  present  tests  for  both  strong  and  weak  consistency  which  do  not 
require  substitution  of  null  values.  These  tests  require  essentially  the  same 
computational  effort  as  in  the  case  of  relations  with  no  nulls. 

For  strong  consistency,  we  present  sound  and  complete  inference  rules  for 
FDs  and  a  way  to  test  consistency  with  exactly  the  same  algorithm  which  is  used 
in  relations  with  no  nulls. 

For  weak  consistency,  the  transitivity  inference  rxile  for  FDs  is  not  sound. 
When  an  FD  is  satisfied  in  a  relation,  something  more  may  be  known  about  the 
possible  values  that  the  nulls  in  the  relation  represent.  We  have  rules,  called 
null- substitution  rules,  that,  when  applied,  give  us  this  extra  knowledge  about 
nulls.  Null-substitution  rules  either  substitute  a  null  with  a  constant  value  or 
equate  null  values,  thus  introducing  equivalence  classes  of  nulls.  The  rules  can 
be  applied  only  a  finite  number  of  times  and  in  any  order.  For  relation  instances 
produced  after  their  application,  the  consistency  test  can  be  made  with  Lhe 
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same  algorithm  used  for  relations  with  no  nulls. 

In  chapter  5  wc  apply  our  results  to  the  theory  of  relational  database 
design.  In  particular,  we  demonstrate  our  ability  to  weaken  the  universal  rela¬ 
tion  assumption  with  the  use  of  nulls,  while  still  retaining  the  theoretical 
benefits  of  its  invocation.  For  a  multi-relation  database,  consistency  has  to  be 
tested  on  a  universal  instance,  somehow  corresponding  to  the  database.  In  our 
weaker  notion,  we  do  not  require  that  the  database  is  join-consistent  as  a  neces¬ 
sary  condition  for  satisfiabilty  of  functional  dependencies.  The  test  for  con¬ 
sistency  is  not  required  to  be  performed  on  the  join  of  all  relations  but  on 
another  universal  relation  instance.  This  instance  is  generated  from  the  data¬ 
base  when  any  tuple  in  the  database  is  considered  as  a  tuple  of  the  universal 
relation  -  unavoidably  with  null  values  for  some  of  the  attributes.  It  is  argued 
that  this  instance  is  in  a  natural  correspondence  with  the  database  and  that  the 
test  for  database  consistency  is  correct.  Furthermore,  the  test  can  be  done  in 
polynomial  time.  This  is  contrasted  with  the  universal  relation  instance  test 
which  is  exponential.  Finally,  in  databases  that  pass  this  consistency  test,  there 
are  no  modification  anomalies  (the  undesirable  effects  of  the  UR-assumption). 
We  show  that  this  database  consistency  test,  called  global,  is  equivalent  to  the 
test  which  is  based  on  the  UR-assumption  for  join-consistent  databases. 

Even  though  the  globed  consistency  test  requires  time  polynomial  on  the 
database  size,  it  is  still  too  expensive  for  any  practical  use  as  part  of  any  data¬ 
base  modification.  We  propose  two  easier  consistency  tests  which,  as  expected, 
are  not  correct  for  all  databases  (i.e.  are  not  equivalent  with  the  global  test).  We 
have  found  some  schemas  where  the  tests  on  their  instances  are  correct.  The 
general  case  is  an  open  problem. 

The  final  part  of  this  thesis  has  one  chapter.  Conclusions  are  drawn  in  this 
chapter,  the  thesis  results  are  summarized,  the  contributions  are  presented, 
and  possible  extensions  are  considered 


PART  n 


OPERATIONS  IN  THE  PRESENCE  OF  IMPERFECTION 


In  this  part  we  describe  our  basic  framework.  First,  a  brief  introduction  to 
relevant  aspects  of  the  relational  model  is  given  in  chapter  2.  Operations 
(queries  and  modifications)  in  the  model  are  defined  as  functions.  We  adopt  the 
predicate  calculus  view  for  the  query  aspects  of  the  model  in  this  context.  A 
short  introduction  on  formal  database  semantics  is  also  presented.  We  then 
consider  the  possibility  of  introducing  capabilities  to  deal  with  imperfection  in 
the  relational  model.  Imperfection  takes  the  form  of  null  values.  We  show  how 
each  domain  together  with  the  two  null  values  (missing  and  nothing)  and  a  qual¬ 
itative  partial  ordering  becomes  a  lattice.  A  general  rule  for  operation  (func¬ 
tion)  extensions  is  presented  and  its  selection  is  justified  both  on  syntactic  and 
semantic  grounds.  The  justification  is  made  with  the  use  of  basic  lattice  theory 
and  the  mam  results  of  the  mathematical  theory  of  computation.  In  chapter  3 
we  use  an  extension  rule  for  query  eveduations.  We  also  present  an  alternative 
(but  equivalent)  method  for  query  evaluations  which  is  shown  more  economical 
computationally  than  the  straightforward  application  of  the  basic  extension 
rule.  In  the  final  section  of  this  chapter  we  discuss  modification  operations  in 
the  presence  of  imperfection. 


Chapter  2 


The  Framework 


I've  got  plenty  of  nothing 
-  Ira  Gershwin 


2.1.  The  Relational  Model 

Traditionally,  relations  are  considered  as  subsets  of  the  Cartesian  product 
of  a  list  of  sets  of  values  (domains).  An  element  of  a  set-theoretic  relation  is 
called  a  tuple.  That  is,  a  tuple  is  a  list  of  values  where  each  of  the  values  belongs 
to  a  domain  in  the  Cartesian  product.  It  helps  conceptually  to  view  relations  as 
tables  where  each  row  is  a  tuple  and  each  column  is  named.  The  names  of  the 
columns  are  called  attributes. 

In  this  thesis  we  adopt  an  alternative  formulation  of  relations.  The  main 
difference  of  this  formulation  from  the  traditional  one  is  that  tuples  are  now 
considered  as  maps  (functions)  from  attributes  to  domains.  [Ullman  80]  points 
out  that  this  difference  is  only  technical  and  there  is  an  obvious  method  of  con¬ 
verting  between  the  two  formulations.  More  formally,  the  universe,  U,  is  a  set  of 
attributes  A^,A2.  '  "  ’  ,Ap-  Subsets  of  the  universe  are  denoted  by  the  letters 
X.  Y,Z.R.S,...  A  subset  of  the  universe  is  called  a  relation  scheme.  Each  attribute 
Ai  in  the  universe  has  an  associated  domain  The  domains  may  be  sets  of 
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integers,  strings,  reals,  dates,  etc.  It  is  noted  that  domains  may  be  very  large 
but  are  always  finite.  For  instsince,  the  number  of  names  is  potentially  infinite. 
But  in  a  database,  the  mciximum  number  of  elements  in  the  domain  of  the  attri¬ 
bute  name,  is  determined  by  the  maximum  edlowed  length  of  a  character  string 
(the  physical  representation  of  a  neime),  which  can  be  stored.  A  tuple  f  is  a 
function  from  a  set  of  attributes  to  the  Cartesian  product  of  the  associated 
domains.  Thus,  ifR-=\Ai.A2>  ’  '  '  .-^ni  is  a  subset  of  U,  then: 

t  :R  ^  YIDa.  -  Dr 

i  =  l 

A  relation  instance,  or  simply  relation,  r,  is  defined  as  a  set  of  tuples.  A 
database  schema  p  is  a  list  of  relation  schemes  denoted  by:  p=(/2i,/22*  ’  *  ’ 

An  instance  of  p,  represented  as  [ri,r^  •  •  •  is  called  a  relational  database. 
Without  loss  of  generality,  we  assume  that  there  Eire  no  databases  in  which  dis¬ 
tinct  relations  have  the  same  seheme. 

A  relation  can  be  interpreted  as  a  predicate  on  tuples.  For  example,  the 
relation  with  scheme  [name,  age]  is  interpreted  on  the  tuple  (John,  40)  as: 
"John  is  40  years  old".  We  say  that  t  belongs  to  an  instance  r  of  /?  if  r{t)‘=true. 
Negative  information  in  a  relation  instance  r  is  indicated  with  tuples  that  are  not 
in  the  instance  (closed-world  assumption). 

Relational  data  languages  have  a  rich  set  of  commands  for  manipulation  of 
relations  and  for  their  interpretation.  We  distinguish  two  aspects  of  a  data 
language: 

(a)  query  aspect  -  i.e.  the  lookup  operation  on  relations; 

(b)  nonquery  aspect  -  i.e.  the  modification  operations  on  relations. 

Relational  languages  break  down  into  two  broad  classes.  In  edgebraic 
languages,  queries  are  applied  as  operators  defined  on  relations  that  produce 
other  relations  (i.e.  the  responses  to  the  queries).  In  predicate  calculus  based 
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query  languages,  the  response  to  a  query  is  a  desired  set  of  tuples  which  satisfy 
a  predicate  specified  in  the  query.  Alternatively,  considering  that  queries  are 
applied  on  individued  tuples,  a  tuple  belongs  to  the  response  if  the  predicate  of 
the  query  evaluated  on  the  vsdues  of  the  tuple  is  true.  We  will  denote  a  query  by 
its  predicate  Q.  Let  T  be  the  set  of  truth-values  [true, false]  and  t  a  function 
from  the  attribute  set  XQU  to  the  Cartesian  product  domain  Dx-  We  formally 
define  the  interpretation  of  Q  on  f  as  the  function: 

Q  :t  :[X  -^Dx]-*T 

We  say  that  t  belongs  to  the  {true)- answer  of  Q  if  Q  {t)=true. 

In  the  next  section  we  discuss  the  semantics  of  databases.  The  basic  con¬ 
cepts  of  the  relational  model  are  then  reexamined  in  the  presence  of  imperfec¬ 
tion  within  a  formal  semantic  framework. 

2.2.  Formal  Semantics  of  a  Database 

Informally,  the  semantics  of  a  database  is  the  meaning  the  database  con¬ 
veys  to  the  user.  We  may  think  of  semantics  as  the  interpretation  of  the  data 
and  the  data  operations,  ML  in  the  framework  of  a  data  model  (e.g.  the  rela¬ 
tional  model).  If  we  define  database  states ;  as  user  abstractions  of  the  real 
world  at  specific  instances  of  time,  then  transitions  between  database  states 
refiect  the  dynamic  changes  of  the  real  world..  Having  these  informal  notions,  we 
may  define  the  formal  semantics  of  the  languages  used  in  the  data  model  (the 
data  definition  language  -  ddl,  and  the  data  manipulation  language  -  dml).  In 
essence,  we  may  map  the  'syntactic  constructs’  used  in  the  languages  to  their 
'meanings’. 

The  view  taken  in  [Biller  et  al  76]  is  that  the  data  definition  and  data  mani¬ 
pulation  languages  are  sets  of  expressible  ddl  and  dml  programs.  As  an  exam¬ 
ple,  a  large  part  of  the  CODASYL  ddl  and  dml  is  given  a  formal  semantic 
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description.  In  giving  the  formed  semeintics  [Biller  et  ed  76]  follow  the  exact  S5m- 
tax  of  the  language,  including  some  COBOL  statements,  thus  introducing  a  lot  of 
low-level  detail.  The  goal  is  to  provide  a  tool  for  verification  of  CODASYL 
language  implementations.  Our  goal  for  this  thesis  is  different.  We  are 
interested  in  examining  the  behavior  of  existing  high-level  languages  in  the  pres¬ 
ence  of  imperfection  in  the  database.  Since  we  work  at  a  much  higher  level  of 
abstraction,  we  do  not  have  to  give  the  exact  syntax  of  the  data  model  descrip¬ 
tion  lemguages.  We  do  not  give  meanings  to  statements  of  a  dml  program  but  to 
the  program  itself  (e.g.  insert,  delete,  etc.). 

At  this  level,  there  are  three  basic  approaches  used  in  giving  formal  seman¬ 
tics  of  languages:  axiomatic,  operational  and  denotational  semantics.  In 
axiomatic  or  formal  semantics,  certain  axioms  and  proof  rules  are  used  for  for¬ 
mal  proofs  of  program  properties  -  an  implicit  way  to  assign  meanings.  The 
notion  of  state  is  suppressed  in  this  approach.  The  other  two  approaches  are 
model-theoretic.  That  is,  meaning  is  attributed  to  programs  by  their  relation  to 
a  model.  In  operational  semantics  abstract  machines  are  used  for  models  and 
every  program  instruction  is  a  state-transforming  action  of  the  machine.  The 
initial  state,  all  intermediate  states  and  the  final  state  are  explicitly  given. 

We  adopt  the  denotational  semantics  approach  for  assigning  meanings  to 
database  operations.  In  denotational  semantics,  developed  by  Scott  and  Stra- 
chey  ([Scott  71],  [Donahue  74],  [Stoy  77],  [Reynolds  72]),  definitions  eire  always 
at  a  higher  level  of  abstraction  than  the  operational  definitions,  since  the  inter¬ 
mediate  states  in  a  state  transition  are  suppressed.  The  approach  relies  solely 
on  basic  mathematical  notions  such  as  sets,  functions,  continuity  and  opera¬ 
tions,  This  allows  the  use  of  mathematical  techniques  (e.g.  induction)  in  proofs 
of  results.  Furthermore,  recursion,  which  appears  constantly,  sometimes  impli¬ 
citly,  in  database  models,  is  a  basic  element  in  denotational  semantics  and  is 
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treated  in  a  very  elegant  way  with  least  fixed  points. 

The  concept  of  qualitative  approximation  is  central  in  the  theory  of 
mathematical  or  denotational  semantics.  We  say  that  an  element  x  in  a  domain 
D  is  approximated  by  another  element  y  in  D,  if  x  is  more  "precise"  or 
"smooth"  than  y.  There  are  two  basic  results  of  denotational  semantics.  The 
first  result  states  that  a  data  type  is  a  complete  lattice,  and  the  second  states 
that  "allowable"  mappings  between  data  types  are  continuous  (preserve  limits 
and  do  not  increase  information).  The  use  of  approximation  is  motivated  by  the 
fact  that  data  types,  traditionally  considered  as  sets  of  objects,  naturally 
include  infinite  objects  for  which  no  finite  representation  can  be  given.  We  can 
define  these  infinite  objects  as  the  limits  of  a  sequence  of  finite  approximations. 
A  data  type  together  with  the  approximation  relation,  as  introduced  above,  is  a 
complete  lattice.  The  infinite  objects  (limits)  are  naturally  the  gib  and  lub  of 
such  sets.  Mappings  between  lattices  are  restricted  to  be  continuous,  f  Consid¬ 
ering  approximation  as  the  information  content  of  the  domain  elements  (how 
precise  or  accurate  they  are),  then  it  is  obviously  not  desirable  to  allow  map¬ 
pings  between  lattices  that  increase  or  decrease  the  information  content  of  an 
element.  In  addition,  the  notion  of  continuous  functions  comes  in  handy  with 
the  notion  of  effectively  computable  functions  [Scott  76].  These  are  functions 
that  produce  a  result  after  considering  only  a  finite  amount  of  input. 

2.3.  Imperfectiozi  in  a  Relational  Database 

Introducing  special  values  like  missing  and  nothing  in  a  database  is,  from  a 
syntactic  point,  very  simple.  One  simply  associates  these  special  values  with  the 
sets  of  regular  values  that  appear  in  the  database.  For  instance,  in  a  relationeil 

t  Continuity  is  translated  to  monotonicity  for  finite  domains.  A  fxmction  /  :  A  -*B  is  mono¬ 
tonic  if  for  every  x,  y  such  that  x  approximates  y,  then  f  {x)  approximates  /  {y  ). 
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database,  for  every  attribute  (e.g,  employee-salary)  vre  have  a  set  of  sillowable 
values  called  the  attribute-domain  (e.g.  integers  ranging  from  10000  to  50000). 
Every  such  domain  is  extended  to  include  the  new  valid  values  missing  and  noth¬ 
ing.  From  a  semantic  view  though,  two  questions  arise. 

a)  What  is  the  relationship  (if  any)  of  these  special  values  with  the  other  regu¬ 
lar  vedues? 

b)  How  do  we  extend  all  the  operations  on  domains  when  the  domains  contain 
these  special  values? 

In  this  section  we  answer  the  first  question.  The  second  question  is 
addressed  in  the  sequel. 

To  see  the  relationship  of  missing  and  nothing  to  other  domain  values,  we 
introduce  a  qualitative  sense  of  approximation  in  all  domains.  Of  course,  not  all 
elements  in  the  domain  need  be  related  with  this  approximation  relation.  For 
example,  in  employee-salary,  it  makes  no  sense  to  say  that  $10000  is  more  pre¬ 
cise  than  $20000.  On  the  other  hEuid,  the  salaries  missing  and  nothing  play  a 
very  interesting  role  in  this  approximation  relation.  Missing  may,  by  definition, 
be  one  of  any  of  the  possible  salary  values.  In  essence  this  meeins  that  missing 
approximates  any  other  sedary  value.  It  is  less  precise  as  a  value.  The  salary 
value  nothing  is  ein  impossible  salary  value  -  a  value  more  precise  than  any  other 
salary  value.  It  is  a  value  which  is,  at  the  same  time,  all  possible  values,  i.e.  is 
approximated  by  all  salary  values.  It  is  very  easy  to  verify  that  the  attribute 
domain  D  together  with  the  two  special  values  and  the  approximation  relation  is 
a  complete  mathematical  lattice.  The  greatest  lower  bound  (gib)  and  least 
upper  bound  (lub)  of  this  lattice  are  the  nothing  and  missing  values,  respec¬ 
tively. 

More  formally,  consider  a  domain  D.  Let  t?  (top)  and  w  (bottom)  denote  the 
imperfect  elements.  We  define  an  approximation  relation  ^  (a  partial  order)  on 
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D‘‘  =  D  ,  T?]  as: 

(a)  T?  t  every  v  eZ) 

(a)  Vit'^2  iff  ^i=^s  for  every  vi,vg€£> 

Trivially,  <Z?°,  £>  is  Bl  primitive  or  simple  lattice,  t  Examples  of  simple  lattices 
are  given  in  figure  2.1.  We  keep  the  same  symbols  (w  and  t3)  throughout  this 
thesis  for  the  imperfect  elements,  but,  by  convention  we  refer  to  them  with 
different  names  in  different  domains.  Thus,  in  the  domain  of  truth  values  cj  is 
called  inconsistent  euid  is  called  unknown.  On  the  other  hand,  in  any  attribute 
domain  cj  is  called  nothing  and  is  called  missing. 

From  simple  lattices  we  may  generate  more  complicated  lattices  using 
domain  operations.  In  particular,  we  define  the  Cartesian  product  lattice.  Let 

n 

X=\Ai,Az,  ■  '  •  .An]  with  the  corresponding  domain.  Then, 


=  flDl, 

i  =  l 

Approximation  is  defined  between  elements  in  Dx  as  follows.  Let 
t={vi,vz.  •  •  •  ,Vn)  and  t'={v\,v'2,  •  •  •  ,v'n)  then 

t  £2)  “  iff  to  “  ^ 

X 

An  example  of  such  a  lattice  is  given  in  figure  2.2. 

For  each  tuple  t  inDx  we  associate  two  uniquely  determined  sets  of  tuples. 

1. -  AP it)  \r  \  ttt'  and  V^x] 

2. -  COMPL{t):=\V\ttt'  and  V  €j)x] 

AP{t )  is  the  set  of  all  tuples  that  t  approximates  and  CGMPL  (t)  is  the  set  of 
all  tuples  with  no  imperfect  values  that  t  approximates  (i.e.  the  completions  of 


t  When  making  assertions  about  domains  we  should,  to  be  precise,  qualify  symbols  such  as  £ 
and  CJ  to  indicate  the  domain  which  is  involved.  For  example,  in  the  above  definition  we 
should  write  v  ^  V2  and  co/j.  However,  we  will  omit  such  qualifications  in  cases  where  the 
context  makes  them  obvious. 
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J^ignre  2.1. (a)  Attribute  Domain  Extension 


D  is  the  domain  for  the  attribute  MONTH 
D  -  [Jan,  Feb,  •  •  •  ,  Dec  | 

£)*  = 


1?  TTiissing 


approximation 


EiS-UX£  2.  l.(h)  Extension  of  the  Domain  of  Truth  Values 


T  is  the  domain  of  truth-values 

T  ~  1  true,  false  J 

r'=  ruf'J.  ‘j!  •' 

urJcTLOwn  -  either  ti-ue  or  false 

/ 

tr'ue 

false 

u 

inconsistent  -  both  true  and  false 
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2.2  A  Cartesian  Product  Domain  Extension 
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t).  The  above  sets  will  be  used  frequently  in  the  sequel.  We  conclude  this  sec¬ 
tion  with  the  observation  that,  when  the  imperfect  elements  (null  values)  are 
introduced  in:  a  relational  database,  tuples  are  defined  as  maps  from  attribute 
sets  to  their  extended  Cartesisin  product  domain  t  :  X-^Dy,  XQU. 

2.4.  Extension  of  Functions  Between  Domains 

Having  introduced  the  imperfect  elements  in  a  relational  database  we  turn 
our  attention  to  the  operations  that  may  be  performed  when  these  elements  are 
present.  The  operations  or  functions  on  the  database  must  be  "extended"  to 
apply  to  the  imperfect  elements.  Unfortunately,  there  is  no  universally  suitable 
"stemdeird"  way  of  choosing  extensions.  Our  effort  in  the  sequel  will  be  to  select 
the  appropriate  t3q)e  of  extension  for  our  purposes.  The  selection  criteria  wOl 
be  both  semantic  and  syntactic.  Our  discussion  starts  with  an  informal  presen¬ 
tation  and  justification  of  an  extension  rule.  An  example  is  used  for  illustration 
purposes.  The  rule  is  then  formalized  and  its  syntactic  properties  are  analyzed. 

In  a  two-valued  logic  system,  we  define,  using  truth-tables,  predicates  like 
conjunction  (a)  and  negation  (— )  as  functions  on  the  domain  of  truth  values 
(true,  false).  Since  the  domain  of  truth  values  has  been  enriched  with  two  more 
values  (unknown,  inconsistent),  we  must  ’extend’  these  functions  according  to 
the  following  logical  preconditions. 

Requirementj^l.  The  extended  function  must  be  the  same  as  the  original  func¬ 
tion  wherever  the  original  function  is  defined. 

This  requirement  is  obvious  by  the  mere  definition  of  function  extensions. 
Here  are  the  truth-table  definitions  of  negation  and  conjunction  when  the  first 
requirement  is  met  (a  "dash"  here  means  value  not  established  yet). 
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neffation 

true 

false 

unknown 

inconsistent 

false 

time 

- 

- 

coniunctiou 

true 

false 

unknown 

inconsistent 

true 

true 

false 

- 

- 

false 

false 

false 

- 

- 

unknown 

- 

- 

- 

- 

inconsistent 

- 

- 

- 

- 

Figure  2.3.1 


Requirementj^S.  The  extended  function  must  be  monotonic. 

I 

This  requirement  is  justified  by  denotational  semantics.  One  of  the  basic 
results  in  the  mathematical  theory  of  computation  is  that  any  extension  which  is 
not  monotonic  is  undesirable  since  it  does  not  preserve  the  approximation  ord¬ 
ering.  By  the  requirement  of  monotonicity,  the  definition  of  negation  is  com¬ 
pleted.  t  ' 


neffation 

true 

false 

unknown 

inconsistent 

false 

true 

unknovjn 

inconsistent 

Figure  2.3.2 

On  the  other  hand,  there  are  several  monotonic  extensions  of  conjunc¬ 
tion.  All  those  extensions  will  have  truth  values  of  the  following  form. 


t  Notice  that  since  unknown  approximates  both  true  and  false,  its  negation  must  approxi¬ 
mate  both  their  negations  (te.  false,  true).  That  is.  the  negation  of  unknown  is  still  unk¬ 
nown. 
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coniunction 

true 

false 

unknown 

inconsistent 

true 

true 

false 

unknown 

inconsistent 

false 

false 

false 

- 

- 

unknown 

unknown 

- 

unknown 

- 

inconsistent 

inconsistent 

- 

- 

inconsistent 

Figure  2.3.3 


Requirement's.  Whenever  inconsistent  is  one  of  the  arguments,  the  result  is 
inconsistent.  Whenever  unknoum  is  one  of  the  arguments  and  there  is  no  incon¬ 
sistent  argument,  the  result  is  the  best  approximation  from  all  possible  out¬ 
comes,  that  is,  the  least  upper  bound  (lub). 

As  an  illustration,  we  will  establish  the  value  of  (false/\unknovjn).  Unknown 
could  be  either  true  or  false.  In  the  first  case,  false/\true— false,  in  the  second 
case,  false/\false= false.  The  best  approximation  between  false  and  false  (i.e. 
the  lub)  is,  of  course,  false.  Hence,  false/\unknown= false.  The  complete 
definition  of  conjunction  follows. 


coniunction 

true 

false 

unknown 

inconsistent 

true 

true 

false 

unknown 

inconsistent 

false 

false 

false 

false 

inconsistent 

unknown 

unknown 

false 

unknown 

inconsistent 

inconsistent 

inconsistent 

inconsistent 

inconsistent 

inconsistent 

Figure  2.3.4 


The  above  intuitive  requirements  are  now  formalized.  First,  we  discuss  how 
functions  with  one  argument  behave  in  the  presence  of  imperfection  and  then 
we  generalize  for  multiple-argument  functions.  Let  D^,  Zlg  t>e  two  domains.  We 
denote  by  F{Di,D^  all  the  functions  from  D\  to  ZJg.  When  we  add  the  imperfect 
elements  in  each  domain,  every  /  in  F{Di,D^  must  be  "extended”  to  apply  on 
the  imperfect  elements. 
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Definition  2.1 

We  say  that  /“rDi-^jDg  is  an  extension  of  /  :Di-^Dz  if  for  all  v^Di, 
f\v)=f{v).  ■  * 

We  denote  \>y  F{D\,Dz)  the  set  of  all  function  extensions.  Our  effort  in  the 
sequel  vrill  be  to  select  the  appropriate  type  of  extension  for  our  purposes.  Only 
monotonic  function  extensions  are  intpresting  in  our  context.  Therefore,  it  is 
reasonable  to  assume  that  only  monotonic  extensions  should  be  considered  in 
F{D\,Dz)>  Observe  that  given  a  function  /,  all  its  monotonic  extensions  form  a 
lattice  with  approximation  defined  as,  if  /*,/'*  are  two  monotonic  extensions  of 
/.  then  /“£/'“  if  for  every  v. 

The  least  upper  bo\md  in  this  lattice  is  called  the  least  extension  of  /. 


Proposition  2.1 

For  every  fcF{D  i.Dz),  the  function  /°,  defined  as  follows. 


:  Dl^Dz  :  v-*f\v)  := 


cj  Uv  =  u 

f{v)  ifveDi 

lub  \f  (v' )  \  v' eD  il  ifv=i3 


is  always  its  least  extension. 

Proof 

Trivially,  /“  is  an  extension  of  f  and  is  also  monotonic.  We  show  that  for  any 
other  monotonic  extension  of  /,  say  /'“,  we  have: 

L-  for  every  x^Di,  /'°(x)  £  from  the  definition  of  extensions; 

ii.-  for  X  =  oi,  f^ix)  t 


iii.-  for  a:  =  note  that  /'°(x)  cannot  be  u  (otherwise  the  extension  is  not 

monotonic).  It  is  easily  seen  that,  however  f'°{x)  is  defined,  it  always  approxi¬ 
mates  the  least  upper  bound.  ■ 

The  above  proposition  is  a  semantic  argument  for  the  use  of  least  exten¬ 
sions.  Note  that  all  other  monotonic  function  extensions  approximate  this 
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extension.  This  means  that  the  least  extension  always  gives  the  more  accurate 
or  precise  answer  when  evaluated.  We  proceed  by  presenting  a  syntactic  advan¬ 
tage  of  least  extensions  and  also  a  disadvantage. 

Consider  the  Cau^tesiem  product  Dx^-D  {xD  2^  '  •  •  *  the  domain/?,  and  the 

function  /  :  Dx-*D.  For  notational  convenience  we  may  think  of  f  as  an  n-tuple 
'  ■  '  -fn)  where  each  fi  is  defined  recursively  as: 

fn  • 
fn-\  ■ 

fn-2  •  ^n-2~*F  {Pn-h^  {Pnt^Y) 


fi  D\-*F{D2,D2,  '  •  ■  ,Dn) 

The  function  /  applied  to  an  element  of  Dx  is:  :  /sC^e)  :  •  •  •  :  fni'^n))  This 

reads  as,  "fi  applied  to  Vi  gives  a  function  /g  which  applied  to  vg  gives  a  func¬ 
tion  /s,  ..." 

Not  every  monotonic  extension  applies  smoothly  to  functions  of  more  than 
one  argument.  For  example,  consider  the  doubly-strict  extensions.  We  say  that 
:  DI-*D2  is  a  doubly-strict  extension  of  f  :  Z?i->/?2  •  it  and  /“(cj)=cj. 

There  can  not  be  a  doubly-strict  extension  for  a  function  with  many  arguments. 
For  instance,  f°{  •  •  •  ,  tJ.  •  •  •  ,  cj,  ••■)=? 

Proposition  2.2 

The  concept  of  least  extension  applies  smoothly  to  functions  of  more  than 
one  argument.  That  is,  given  a  function  f  =  (fi  '■  f  2  ■  '  '  '  ■  /n).  the  function 
(/i  •  /a  •  ■  ■  ■  :  fn)  is  its  least  extension. 


Proof 

We  prove  the  proposition  for  the  case  of  two  arguments.  Induction  may  be  used 
for  the  general  case.  From  the  definition  of  /i  and  /g  it  is  readily  shown  that: 
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f\vi,Vz)'=  (Jl{vi)\fz{v2))  = 


cj  if  i=CiJ  or  V2—^ 

ilvx^DioJidvz^Dz 

luh[f{v\.v'2)\v\^Di,  v'z^Dz]  if  P'1?  or  1/2=1? 


The  claim  is  that  /°  as  constructed  above  is  the  least  extension  of  /.  That 
is,  for  every  other  extension  /'“  of  /,  When  both  arguments  are  not  w  or 

1?.  then  by  definition  of  extensions  t  /*•  When  at  least  one  of  the  arguments  is 
an  imperfect  element,  we  have  five  essentially  distinct  cases.  Namely,  the  argu¬ 
ments  being  (  i?,  i?),  (i?,  cj),  (i?,  vz),  (w,  vz),  (cj,  w).  If  one  of  the  arguments  is  w, 
then  since  /*  is  w  (by  definition),  it  is  trivial  that  /'*£/•  For  the  case  of  (i?,  Vz) 
being  the  argument,  we  have: 

/°(i?,  vz)=(fl{i^)  :  =  {lub  \v  i  \  v iCD  i  ]  :  /zi^z)) 

For  any  extension 


Vz)  =  (/'i(i?)  :  f'z(vz))  =  (/’i(i?)  :  fz(vz))  t  Vz) 

The  case  of  (i?.  i?)  as  an  argument  is  treated  similarly.  ■ 

We  now  present  a  proposition  that  illustrates  a  disadvantage  of  least  exten¬ 
sions.  This  is  a  drawback  for  efficient  function  evaluation. 

Proposition  2.3 

The  concept  of  least  extensions  does  not  extend  smoothly  to  the  composi- 

* 

tion  of  functions.  That  is,  the  composition  of  least  extensions  is  not  always  a 
least  extension. 


Proof 

We  prove  the  above  by  presenting  a  counterexample  of  two  functions  which  are 
least  extensions  but  their  composition  is  not  a  least  extension. 


Consider  the  function  f  that  maps  integers  to  even  integers. 

X  if  X  is  even 


f  :  Int^Int  :  x^f{x)  := 


X  +  1  if  X  “is  odd 


and  its  least  extension: 


w  if  x  =  aj 

/  (x )  if  X  €.Int 

lub[f  {x')\  x' €.Int  ]  ifx=i? 


:  Int^-^Int’'  :  x-^/°(x)  := 
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cj  if  x  =  cj 
f(x)  if  X  ^Int 
'd  ifx='i5 


since  there  are  both  even  and  odd  integers. 

Consider  6lIso  the  function  g  (predicate)  which  maps  integers  to  truth 
values,  and  its  least  extension  g°: 

g  :  Jnt-^T  :  x-^g{x)  :=  ' 


true  if  X  is  even 
false  if  X  is  odd 


g"" :  :  x->^“(x) 


inconsistent  ifx=cj 

:=  j  y  (x)  if  x^Int 

lub  \g  (x' )  I  x'  ^Int  j  if  x  =1? 


inconsist  ent  if  x  =  w 
-  g{x)  it  x€Jnt 

unknown  if  x 

Note  that  both  tub's  in  the  definition  of  /’  and  g’"  are  The  composition  of 
the  two  functions  /  and  g  is: 

g  of  :  Int  -> T  :  x^g  of  {x)  =  g{f{x))  :  =  true 
(since  /(x)  is  always  even.) 

By  applying  the  above  definitions,  it  is  easy  to  verify  that  the  least  extension 


inconsistent 
true 

while  the  composition  of  the  least  extensions  f° 


if  x=cj 
otherwise 

and  gr°  is: 


g°of\x)  :  = 


inconsistent 

true 

unknown 


if  x  =  cj 
if  x^Int 
if  x='6 


Summarizing,  in  this  section  we  presented  a  general  rule  for  function 
extensions  the  least -extension  rule.  Our  selection  of  this  rule  was  justified  on 
semantic  grounds,  since  a  least  extension  is  the  best  possible  approximation  to 
a  precise  value  among  all  other  monotonic  extensions  of  a  function.  In  addition, 
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the  approximation  has  the  nice  syntactic  property  of  applying  smoothly  to  func¬ 
tions  with  many  arguments.  The  final  proposition  in  this  chapter  illustrated  a 
problem  with  compositions  of  function  extensions.  This  problem  is  of  significant 
importance  for  the  consideration  of  query  evaluations  in  the  next  chapter. 


Chapter  3 


Query  Evaluation  and 
Modification  Operations 


Nothing  can  be  abated  out  of  nothing 


-  Lucretius 


3.1.  Least  Extensions  of  Queries 

In  the  previous  chapter  we  defined  a  query  on  a  relational  database  as  a 
function  from  tuples,  which  are  also  fuctions,  to  truth  values.  We  now  present 
the  least  extension  of  a  query  evaluated  on  tuples  that  may  have  null  values. 

Definition  3.1 

Given  a  set  of  attributes  X=^Ai,A2,  ‘  *  •  An]^  with  its  corresponding  domain 
Dx,  and  a  tuple  t  (a  function  from  .S!'  to  Bx),  a  query  is  defined  as: 


if3i,  l^i^n,  :  Vi=o) 
if  V  i  , 


Q(.t) 

lublQ(t')  I  t'eCOMPL{,t)\ 


What  the  above  definition  says  is  the  following.  The  value  of  the  extended 
query  is  identical  to  the  value  of  the  conventional  query  when  no  null  values  are 
present.  If  any  of  the  tuple  elements  is  the  nothing  null  value  (u),  the  query 
value  on  this  tuple  is  the  inconsistent  truth  value.  The  mesining  here  is  that  the 


answer  is  both  true  and  false.  When  we  have  a  missing  null  value  (t?)  in  a  tuple, 
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the  value  of  the  extended  query  for  this  tuple  is  true  {false )  if  and  only  if  every 
substitution  of  the  null  with  a  regular  value  results  in  a  tuple  for  which  the  con¬ 
ventional  query  evaluates  to  true  {false ).  If  for  some  substitutions  of  the  miss¬ 
ing  null  value  the  conventional  query  is  true  while  for  others  the  conventional 
query  is  false,  then  it  is  not  possible  to  give  a  precise  value  to  the  extended 
query.  The  model  acknowledges  its  imperfection  with  a  maybe,  or  unknown,  or  / 
can’t  tell  answer. 

Following  Codd’s  terminology,  we  distinguish  two  results  for  a  query.  In  the 
true— result,  TR{Q‘‘),  belong  tuples  for  which  evaluates  to  true.  Similarly,  the 
tuples  for  which  the  extended  query  is  unknown  are  in  the  maybe —result, 
MR{Q’‘).  These  are  the  only  results  that  the  user  sees.  Whenever  a  tuple  evalu¬ 
ates  to  false  or  inconsistent  it  is  rejected  from  the  resulting  answer.  We  also 
note  the  non-sophisticated  treatment  of  the  cj  null  value.  No  attempt  is  ever 
made  to  resolve  the  inconsistency  which  w  represents. 

We  now  present  some  examples  to  show  how  queries  are  evaluated  in  our 
approach.  For  the  examples  we  use  the  relation  we  introduced  in  the  first 
chapter.  In  particular,  the  second  example  is  the  query  we  used  to  criticize 
truth-runcLiorial,  many-valued  logic  approaches  for  a  formal  treatment  of  null. 

Consider  the  relation: 


CARS 

tuvleit 

make 

year 

mpg 

#ofdoors 

ti 

turtles 

79 

55 

4 

gasfiuzzler 

80 

T? 

2 

fastcars 

80 

26 

_l4 _ 

neverstops 

79 

CJ 

Figure  3.1 


The  relation  instance  has  four  tuples  which  we  refer  to  as  ^4 
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order.  Note  that  the  Tnjrg  value  in  i  4  is  inconsistent,  while  the  mpg  value  in  ^2 
missing.  Similarly,  the  §of  doors  value  in  both  <3  and  <4  is  known  to  exist  but  is 
missing. 

Examplel.  Consider  the  query:  "Return  all  Gen’s  with  miles  per  gallon  consump¬ 
tion  over  fifty".  Expressed  in  a  relational  calculus-like  language  this  query  is: 

LIST  CARS  WHERE  (CARS.mp^  >  50) 

Formedly,  the  value  of  the  extended  query  is  defined  as: 

T”  :  m  Q°(m)  :=  greater" {m  50) 

We  now  evaluate  this  query  according  to  our  definition  on  each  tuple, 

Q"{t  i[mpg  ])=Q  (f  i[mpg  ])=true  (trivial) 

Q'‘{t2[mpg])=greater"{t2[7npg  ],  50)-greater"{'&,  50)=unknoum 

(for  some  mpg  values,  e.g.  30,  greater (30,  bO)- false,  while  for  others,  e.g.  60, 

greater  {00,  b0)=true.) 

QXt^impg  ])=Q  {t^impg  ])- false  (trivial) 

Q"{t^mpgf)— inconsistent  (immediate  from  our  definition). 

Example2.  Consider  now  the  query:  "Return  all  2-door,  80  models,  eind  all  the  4- 
door  models  with  miles  per  gallon  consumption  more  than  20”.  Expressed  in  a 
relational  calculus-like  language  this  query  is: 

LIST  CARS  WHERE 

((CARS.year=80)  AND  (CARS. o/doors=  2)) 

OR  ((CARS.mp5r>20)  AND  {CKRS.§ofdoors=A)) 

Formally,  the  value  of  the  extended  query  is: 

Q  •  ^ yeBT^^7npg^^§ofdocrrs  ^  ■  (?/•  Q  (l/«  *“ 

:=  (OR  (AND  {equal{y,30),  equal{d,2)),  AND  (greater {m,  20),  eq'ujal(d,  A))))" 
Let  us  determine  whether  ^3  belongs  to  the  true— result  of  the  query.  The 

vedues  for  y  (year)  eind  m  (mpg)  are  known.  We  have  a  missing  value  for  the 
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number  of  doors  d.  We  now  show  that  in  our  approach  the  fact  that  the  number 
of  doors  is  unknown  is  immaterial  for  this  query.  The  assumption  is  that  a  car 
may  only  have  two  or  four  doors.  According  to  our  definition,  we  substitute  the 
Tnissing  value  with  its  possible  values  and  eveduate  the  conventional  query. 

i) -“  fa[#o/doors]=2 

Q(tQ[jeaT,TrLpg,^of  doors])  -  ^(80,  26,  2)  =  OR  {AND  {true,  true).  AND  {true,  false)) 
-  OR  {true, false)  =  true 

ii) .-  f3[#o/doors]=4 

Q(f3[year,mjDg^,^o/doors])  =  Q(80,  26,  4)  -  OR  {AND {true,  false),  AND  {true, true)) 
=  OR  {false,  truje  )  =  true 

Hence,  belongs  to  the  true  -result  of  the  query. 

It  is  easily  verified  that  Q'‘{ti)=true,  Q''{t2)=UTihnow7u  and 
Q’‘{t^)=i7iconsistent. 

The  major  difTerence  of  this  approach  with  a  truth-functional  approach,  is 
that  in  a  truth-functional  approach,  a  query  is  first  divided  into  smaller  queries 
(atomic)  before  being  evaluated.  The  atomic  queries  are  then  evaluated  and  the 
resulting  truth  values  are  combined  with  the  Booleans.  This  is  prohibited  syn¬ 
tactically  in  our  approach,  since  from  proposition  2.3  we  know  that  the  composi¬ 
tion  of  least  extensions  is  not  always  a  least  extension.  It  is  illustrated  with  the 
above  example  that  for  any  query  Q, 

TR^{Q^)QTR{Q'‘)  and  MR{Q°)QMR^{Q^) 
where  the  superscript  C  denotes  the  results  of  truth-fuctional  based  approaches 
(e.g.  Codd’s  approach). 

Looking  at  it  from  another  viewpoint,  the  rules  that  Codd  uses  for  the 
evaluation  of  a  query  are  "sound"  but  not  "complete"  for  our  interpretation  of 
the  missing  value.  If,  using  the  null-substitution  principle,  we  can  determine 
that  a  tuple  belongs  to  the  true— result,  then  there  is  no  doubt  that  it  really 
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belongs  there.  On  the  other  hand,  a  "no"  answer,  i.e.  the  tuple  does  not  belong 
to  the  true -result,  is  not  always  correct  -  we  do  not  have  the  same  level  of 
confidence  in  a  "no"  answer. 

We  close  this  section  by  presenting  a  proposition  that  ties  together  the 
definition  of  qualitative  approximation  between  tuples  with  the  definition  of 
query  evaluation.  In  particular  we  prove  that  approximation  can  be  defined  in 
three  equivalent  ways. 

Proposition  3.1 

Consider  the  Carte  siein  product  lattice  Dx,  and  two  tuples 

t={vi,V2.  •  ■  ■  ,Vn)»  t'  ={y'  i.v' 2>  '  ’  ■  Th®  following  are  equivalent: 

1. -  if  V  i,  l^i^n 

2. -  ttt'  if  COMPL{t)^COMPLir)  t 

3. -  if  for  every  query  Q 

Proof 

(1) =>(2)  (trivial) 

(2) =>(3)  Assume  that  t,t'  do  not  have  cj  elements.  In  such  a  case,  for  every 

tuple  w  and  every  query  Q,  Q'‘{w)=lub[Q{u)\ueCOMPL{w)].  Obviously,  when¬ 
ever  C0MPL{t)'2C0MPL{t'),  the  set  of  truth-values 

\Q  (u) \u€COMPL{t)] '2  \Q  (u)  \  u£COMPL{t')].  But  the  more  truth-values  we 
have  in  these  sets,  the  less  approximate  the  lub  is.  Thus,  )•  An  w  null 

value  remains  in  all  completions.  Therefore  the  proposition  trivially  holds. 

(3) =>(l)  Assume  that  Q\t)^Q’'{t')  for  every  query  Q.  We  will  show  that 
Suppose  that  t  does  not  approximate  t\  Hence,  there  is  an  i  ,  such  that  Vi  does 
not  approximate  v\.  There  are  two  cases.  Suppose  first  that  none  of  Vi,  v\  are 

or  CJ.  The  query  Q  :=  {A{=v^)  will  evaluate  to  tiue  for  t  while  it  will  be  false  for 

t  The  set  of  completions  of  a  tuple  t  was  defined  In  section  2. 1  as  containing  tuples  with  no  CJ 
values.  The  reason  was  that  this  set  is  used  in  query  evaluation  only  when  t  has  no  CJ  nulls. 

For  the  present  context,  we  modify  our  definition  of  COMPL  so  that,  if  a  tuple  t  has  aui  CJ 
then  this  CJ  remains  in  all  com.plelions  of  t.  Hence  the  tuples  in  COMPL  {t)  may  belong  in 
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t'.  Therefore  Q{t)  does  not  approximate  Q  {f )  -  a  contradiction.  In  the  second 
case,  suppose  that  Vi  is  a  regular  value  and  v\  is 'd.  Again  the  above  query  Q  will 
be  true  on  t  and  unknown  on  t\  and  the  same  contradiction  is  derived.  Finally, 
when  Vi  is  cj  and  v'i  is  a  regular  value,  the  contradiction  is  derived  for  any  query 
Q.  ■ 

The  above  proposition  captures  the  intuition  about  approximation  between 
tuples.  We  say  that  a  tuple  is  a  better  approximation  (in  a  qualitative  sense) 
than  einother  tuple  if  it  represents  less  possibilities  (its  set  of  completions  is 
smaller).  This  is  the  same  as  saying  that  any  query  on  the  first  tuple  will  have  a 
more  precise  answer  than  the  same  query  on  the  second  tuple. 

In  this  section  we  examined  the  basic  definition  of  query  evaluations  in  a 
framework  of  imperfect  information.  The  next  section  concentrates  more  on 
the  practical  aspects  of  query  evaluations.  In  particular  we  present  an  algo¬ 
rithm  for  query  evaluations  which  is  more  efficient  than  the  straightforward 
application  of  our  definition. 

3.2.  An  Algorithm  for  Query  Evaluation 

Application  of  the  definition  of  an  extended  query  evaluation  on  a  tuple  with 
null  values  requires  at  most  fc”  substitutions  and  conventional  query  evaluation. 
The  value  k  is  the  maximum  domain  size  among  the  n  attribute  domains.  Notice 
here  that  the  dominating  factor  is  k  and  not  the  exponent  n.  Such  a  complexity 
measure  precludes  any  consideration  of  using  an  algorithm,  along  the  lines  of 
the  query  evaluation  rule,  in  practice.  In  this  section  we  develop  an  algorithm 
for  extended  query  evaluations  which  has  a  superior  complexity  performance. 
For  the  new  algorithm  no  substitutions  of  nulls  with  regular  values  are  neces¬ 
sary.  The  algorithm  is  based  on  transformation  of  queries  to  suitable  forms 
(symbolic  manipulations).  For  simplicity,  we  drop  in  the  sequel  the  superscript 
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"  "  from  our  notation,  but  it  is  understood  that  we  are  still  considering  lattices 
and  least  monotonic  function  extensions. 

The  atomic  formulas  or  primitive  terms  in  a  query  have  the  following  form: 
j)  :=  {A  op  v),  where  A  is  an  attribute  name,  v  is  a  value  in  D^,  and  op  is  one 
of  the  operators  =,  >,  <.  We  assume  that  in  all  queries  the  only  parti¬ 

tions  of  an  attribute  domain  are  given  by  p  and  —p. 

Lemma  3.1 

Let  Q  be  a  query  with  the  special  form: 

q  :=  PiApgA  •  •  •  Ap*. 

where,  for  eachp^: 

1. -  Pi  is  a  primitive  term; 

2. -  -‘Pi  is  not  in  Q. 

Then  for  each  tuple  t—{a\,a2,  *  '  •  .otn)  where  V  j, 

9(0  =  folse  iff  3 Pi  :  Pi(ciy)  =  false  l^jSn,  l^i^k 

Proof 

Suppose  that  fc  =  2,  that  is:  9=PiAp2.  Ifpi  andpg  are  defined  on  different  attri¬ 
bute  domains- the  lemma  trivially  holds.  Assume  that  pi  and  pg  are  defined  on 
the  same  attribute  domain 

(if-part).-  Suppose  one  of  pi,  pg  is  false  on  t\A^,  say  p  x{t\A\)= false.  By 
definition  of  the  least  extended  conjunction,  (piAp2)(f  [A  ^)— false. 

{only -if  part).—  Suppose  that  {pi/\p^{t\A^)=  false  but  none  of  P\,  Pz 
evaluated  on  0-^  U  is  false.  We  derive  a  contradiction.  If  pi(f  [il  ])=<rue 
then  t\A^  must  be  a  regular  value  (not  null)  and  hence  pz{t\AY)-^'^'^^‘  There¬ 
fore.  (p iAp2)(f  [i4  ])=f7~u,e.  If  f[^]=i9,  then  both  pi  and  pg  are  unknown  when 
evaluated  on  t\A\  Since  Pi?^— Pa,  there  are  values  v  in  Da  which  we  can  substi¬ 
tute  for  V  such  that  p  \{v)—pz{y)—true.  For  all  these  values,  {p  x/\p2){y)-true. 
Hence,  Q  {t\Ay)-^  false.  The  above  is  easily  generalized  by  induction  for  the  case 


of  fc>2. 


-  51  - 


Lemma  3.2 

Let  Q  be  a  query  with  the  special  form: 

Q  :=  •  •  Vp* 

where,  for  each  pj: 

1. -  Pi  is  a  primitive  term; 

2. -  -'Pi  is  not  in  Q. 

Then  for  each  tuple  i=(ai,  ag,  •  •  •  ,a^)  where  aj^u,  V  j,  l^j^n 
Q{t)  =  true  iff  3pi  :  Pi{aj)  =  t'rue 

Pr.Q.of 

The  proof  is  similar  to  the  one  in  lemma  3. 1.  • 

The  above  lemmas  state  that  if  a  query  is  a  disjunctive/conjunctive  normal 
form  of  primitive  terms,  then  with  at  most  n  operations  we  can  determine 
whether  the  query  has  the  value  true  /false  for  a  particular  tuple.  The  next 
step  is  the  consideration  of  the  Principal  Disjunctive  Normal  form  (FDNF)  and 
Principal  Conjunctive  Normal  Form  (PCNF)  of  a  query  Q.  We  are  interested  in 
principal  normal  forms  because  they  are  unique.  Let  us  briefly  recall  their 
definitions  [Tremblay  and  Manohar  75]. 

A  minterm  rrii  is  any  product  (series  of  conjunctions)  of  the  primitive 
terms  or  their  negations  but  it  is  not  true  that  both  a  primitive  term  and  its 
negation  appear  in  ttt^.  There  are  2”  possible  minterms  for  n  primitive  terms 
and  there  is  an  ordering  between  them.t  For  instance,  with  n-2,  we  have  the  fol¬ 
lowing  minterms  (in  order);  — jog,  p\/\-'P2>  P\/^PZ'  Having  this 

conventional  order  we  can  talk  about  the  i-th  minterm.  The  PDNF{Q)  is  a  sum 
(series  of  disjunctions)  of  minterms  and  is  denoted  by: 

PDNF(Q)  :=  y^mi^.nrii^  ‘ 

Using  duality  we  can  obtain  the  maxterms  and  the: 


t  The  conventioned  order  is  obtained  as  follows.  First,  the  71  primitive  terms  are  ordered  ar¬ 
bitrarily  (e.g.  Pi,P2,  ■  •  •  ,Pn)-  The  minterm  in  position  i,  TTL^,  is  obtained  by  considering 
the  binary  representation  of  i  using  7i  digits  (patching  O’s  on  the  left  if  necessary);  a  "1"  in 
position  j  means  that  the  primitive  termpj-  appeeirs  in  771^;  a  "0"  in  position  j  meeins  that 
->jDi  appears  in  TTli. 
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PCNF{Q)  :=  0^jV^2"-l,  l^r^m 

A  recursive  algorithm  that  transforms  a  query  Q  to  its  PDNF  is  given  in 
[Tremblay  and  Manoheu*  75].  The  PCNF{Q)  can  be  obtained  directly  from  the 
PDNF{Q)  using  duality.  In  classical  two-vedued  predicate  calculus  it  is  a  well- 
known  fact  that  under  einy  interpretation,  the  truth  value  of  a  query  is  the  same 
as  the  truth  value  of  any  of  its  normal  forms.  The  next  lemma  shows  that  this  is 
also  the  case  for  extended  queries  in  our  framework. 

Lemma  3.3 

Let  P  be  a  query  and  Q  its  least  extension.  If  r  is  a  syntactic  transformation 
of  P  such  that  t{P){u)=P  (u)  for  all  tuples  u,  then  T(Q)(t)=Q  (t)  for  all  tuples  t 
(possibly  with  null  values). 

Ero-of 

Assume  first  that  t  has  an  a  value.  In  this  case  the  value  of  any  query  on  t  is 
inconsistent.  From  our  extension  rule,  when  t  has  no  w’s, 

Q{t)  =  true  iff  P{t')  =  true,  ¥  V  €.COMPL{t) 

=>  Q{t)  =  true  iff  T(P)(t')  =  true,  ^t'€COMPL(t)  -  lemma  assumption 
=>  Q{t)  =  true  iff  T{Q){t')  =  true.  ¥  t'  eCOMPL{t)  -  syntactically  the  same 
=>  Q{t)  =  true  iff  r{Q){t)  =  true  -  definition  of  extensions 

A  similar  argument  holds  for  the  false  truth  value.  The  case  of 
Q{t)— unknown  is  covered  by  all  the  other  situations.  We  have  shown  that 
T{Q){t)—Q{t)fOTQ'V^Tyt.  • 

Theorem  3.1 

Consider  a  query  Q,  its  PDNF,  PCNF  and  a  tuple  t={ai,a2,  '  '  '  ,CLn)  such 
that  l^i^n.  Then: 

1. -  Q(t)  =  PDNF {Q){t)  =  false  iff  mj{t)-=  false,  ^  minterm  m^ 

2. -  Q  (t)  =  PCNF  {Q)(t)  =  true  iff  Mj{t)=true,  N- Maxterm  Mj 

Proof 

We  prove  that  PDNF{Q  ){t  )= false  iff  mj{t )- false,  for  every  minterm  m^-. 
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{if  part).-  Trivial. 

{only -if  part).-  Suppose  PDNF{Q){t)=  false  but  there  exists  a  minterm  m^- 
which  is  not  false  for  t.  By  construction  of  minterms,  mj^-'7n,k  for  every  tti*  in 
PDNF{Q).  Hence,  {mj\/m,f.){t)^ false.  Using  induction  on  the  number  of  min¬ 
terms  it  is  easily  shown  that  PDNF{Q){t)^ false.  The  second  part  of  the 
theorem  is  proven  with  the  dual  argument.  ■ 

An  algorithm  based  on  the  above  result  is  presented  in  Figure  3.2.  Basi¬ 
cally,  for  every  tuple  t  the  algorithm  determines,  by  elimination  of  cases,  to 
which  of  the  query  results  the  tuple  belongs.  The  important  aspect  of  the  algo¬ 
rithm  is  that  no  substitutions  of  the  null  values  are  ever  performed.  As  an  illus¬ 
tration  of  this  algorithm  consider  the  query  in  Example  2  of  the  previous  sec¬ 
tion. 

LIST  CARS  WHERE 

((CARS.year=80)  AND  (CARS.#o/cioors=2)) 

OR  ((CARS.mpy>20)  AND  (CARS.#o/doors=4)) 

Its  qualification  condition  can  be  written  as  (j>\/\pz)\/{—pi/^P3)  where  Pi,pz,P2 

are  the  simple  predicates:  pi=(CARS.#o/(ioors=2),  p8=(CARS.year=80),  and, 

P3=(CARS.mpflr>20). 

The  principal  normal  forms  for  this  query  are: 

PCNF{Q)^Y{M:,M2,M^.M5  =  (piVpsVpa)  A  (piV-paVpa)  A 

A  ( -^p  iVp  2\/p  s)  A  (-P 1  Vp  2\/-.p  3) 

PDArF’(Q)=5]7n7.m6,m3,mi  =  (piApgApa)^/  (piApgA-pa)  \/ 

'A  ( -p  1  Ap  gAp  3)  V  (-P 1 A -p  sAp  3) 

Let  us  evaluate  the  query  for  the  third  tuple  ^3  of  our  example  relation 
CARS.  The  values  in  the  tuple  are  mpsr  =26,  year=80,  §ofdoors='d.  There  are  no 
cj  values  in  the  tuple  so  we  proceed  in  the  next  step  of  the  algorithm.  For  every 
Maxterm  Afj,  we  have  at  least  one  p^  such  that  Pi{t^-true.  In  particular,  pg  is 


Figure  3.2  Query-Evaluation 


Given  a  relation  r  and  a  query  Q,  evaluate  Q  on  tuples  t  =(vi,V2,  '  •  •  ,Vn) 
and  output  the  -result,  TR{Q),  and  maybe  -result,  MR  (Q  ), 


EYALUATB^QUERY(^fuiii6\Afxfi  Q) 

bogin 

oomment:  For  iimpUoity,  it  Is  assumed  that  Q  is  evaluated  on  all 
values  In  t.  Alternatively,  t  may  be  replaced  with  i\X]  where 
X  is  the  set  of  attributes  on  whioh  Q  is  evaluated.  Furthermpre, 
since  the  principal  normal  forms  have  all  the  primitive  terms  In 
each  maxterm/mlnterm,  p<(0  refers  topf(v4),  Vi  is  in  t. 

TR(Q)  ^  0 
MR(Q)  ^  0 

construct  PDNF(Q).  PCNF(Q) 
for  each  f  in  r  do 

if  ¥  i,  v^^co 

comment:  no  inconsistencies  in  t 

then 

begin 

if  ^  Mj  in  PCNF (Q),  3 Pi  :  Pi{t)=true 
then  TR(Q)*-  TR{Q)^[t] 
else  if  3  TTij  in  PDNF{Q)  :  V  p^,  Pi(f  false 
then 

end 

end 

end 


Complexity  analysis 

The  algorithm  runs  in  0{k-2'^)  time,  where  k  is  the  number  of  tuples  in  r 
and  m  is  the  number  of  primitive  terms  in  Q.  We  need  2”’'  time  and  space  to 
write  down  the  terms  in  the  normal  forms  and  we  have  to  look  at  them  for  each 
tuple  in  r. 
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truQ  in  Mq  and  Afg,  and  pg  is  true  in  and  Therefore,  belongs  to  the 
true-result  of  the  query.  In  the  same  way  the  query  is  evaluated  for  the  other 
tuples  in  the  relation  CAES. 

The  complexity  analysis  of  the  algorithm  we  developed  shows  clearly  that 
this  is  a  superior  query  evaluation  method  than  the  direct  query  definition  appli¬ 
cation.  Most  of  the  work  is  done  at  the  syntactic  level  and  no  substitutions  of 
values  are  needed.  Still,  the  number  of  terms  in  the  normal  forms  grows 
exponentially  with  the  number  of  primitive  terms,  n,  in  the  query.  Thus,  for 
queries  involving  quantifiers  (series  of  disjunctions/conjunctions)  it  is  impracti¬ 
cal  to  use  our  query  evaluation  algorithm. 

We  now  show  that  there  is  very  little  hope  for  finding  a  substantially  better 
algorithm  for  query  eveduation. 

Theorem  3.2 

Query  evaluation  is  co-NP  complete. 

Proof 

Consider  an  arbitrary  propositional  calculus  formula  F.  The  propositional 
variables  Xj  in  F  take  values  from  the  domain  Z?=|0,  l|.  We  transform  this  for¬ 
mula  to  a  query  Q  in  the  obvious  way.  If  the  variable  Xj  is  in  then  we  replace  it 
by  the  primitive  term  (A^-=0).  If  the  negation  of  Xj  {-'Xj)  is  also  in  F,  then  we 
replace  the  negation  by  the  term  {Aj-l).  Consider  now  tuples  t  defined  over  the 
set  of  attributes  X-=\A]^,A2,  •  •  •  which  appear  in  Q.  Assume  that 

t\X'\-{v 2,  •  ’  •  ,Vk)  where  Vi,  l^i^fc  we  have  We  show  that  Q(f)  is 

true  iff  is  a  tautology.  The  if-part  is  trivial.  For  the  only-if  part,  according  to 
definition  3.1,  Q  must  be  true  on  t  for  all  substitutions  of  the  null  values  in  t. 
Thus,  there  is  no  combination  of  values  for  t  that  make  Q  false.  Therefore  F  is  a 
tautology.  It  has  been  shown  in  [Cook  71]  that  the  non-tautology  problem  is  NP 
complete.  Hence,  the  tautology  problem  in  propositional  logic,  and  conse- 
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quently  query  eveduation,  is  co-NP  complete.  • 

It  seems  reasonable  to  believe  that  the  number  of  terms  in  the  principal 
normal  forms  could  be  minimized.  This  is  not  an  easy  task  though,  since  it 
clearly  contains  the  classical  problem  of  minimizing  Boolean  functions.  This  last 
problem  is  referred  to  as  Minimum  Disjunctive  Normal  Form  problem  in  [Garey 
and  Johnson  79]  and  is  sho'WTJ  to  be  NP-hard.  Some  interesting  results  may  be 
derived  from  an  average  case  analysis  of  our  query  evaluation  algorithm  but  no 
such  analysis  is  considered  in  this  thesis. 


3.3.  Modification  Operations 

In  this  section  we  discuss  modification  operations.  First,  some  basic 
definitions  and  facts. 

Let  r  be  a  relation  instance  of  R,  possibly  vrith  null  values.  We  define,  emalo- 
gously  to  the  definitions  of  the  sets  AP{t)  and  COMPL{t)  from  section  2.3,  the 
set  of  approximations  and  the  set  of  completions  of  the  relation  instance  r. 

AP{t)  :=  the  set  of  relation  insteinces  s  where  s  is  defined  as  follows: 

Al.-  ^  t^r,  for  exactly  one  u€AP{t),  ues; 

A2.-  no  other  tuples  are  in  s. 

COMPL  (r )  :=  the  set  of  relation  instances  s  where  s  is  defined  as  follows. 

Cl.-  ¥  iCr,  for  exactly  one  u€.COMPL{t),  u€s\ 

C2.-  no  other  tuples  are  in  s. 

From  the  above  definitions  it  is  easily  derivable  that  if  r  is  not  empty  (|r|^l) 
then  for  every  s  in  the  sets  COMPL  (r)  and  AP{r)  the  maximum  number  of 
tuples  in  s  is  jr|  and  the  minimum  number  of  tuples  is  1.  The  latter  occurs  when 
3  ^ V  t'er  and  the  former  occurs  when  all  tuples  in  r  are  incompatible 
in  approximations. 


Wc  define  approximation  between  relation  instainces  as  follows.  Let  r,  r'  be 
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instemces  of  R.  We  say  that  r  approximates  r',  denoted  by  r^r',  when  both: 

1. -  \r  1  ^  Ir’  I 

2. -  Vi€r  Bi'er*  :  ttV 

Note  that  condition  1  is  necessary  in  our  definition  if  We  want  the  set  of 
instances  of  R  to  be  a  lattice  with  the  approximation  ordering.  When  condition  1 
is  not  present,  anti-reflexivity  fails  as  the  following  example  illustrates.  Con¬ 
sider  approximation  defined  only  with  condition  2  and  the  two  instances  of  R, 
r  =  [(a,  T?)i.  r'  =  ‘^)i-  have  r£r'  and  r’£r.  but  r^r\ 

Proposition  3.2 

rtr'  iff  ¥  seCOMPL(r),  3  s' eCOMPL  )  :  sQs' 

FlqqZ  (trivial).  ■ 

Note  that  when  rQr'  then  r£r'.  From  the  above  proposition,  if  r  and  r*  have 
noil’s,  rtr'  iffrCr'. 

Approximation  as  defined  above  makes  the  set  of  all  instances  of  /?  a  lattice. 
The  two  operations  {meet  and  union)  are  defined  as  follows. 

uni(yn{T,r')  :=  \u  \  u€.r  or  iter'  j 

meet  {r,r' )  :=  all  tuples  u  defined  by  the  rules: 

1. -  If  It €r  and  Iter' then  It emeef  (r,r' ),  r  =  r-|uj,  r'  =  r’ —\u]; 

2. -  if  it€r  and3  f'€:r'  :  u^t' then  uCmeet{r,r' ),  r  =  r-\u],  r'  =  r' —  \t' 

3. -  ifiter' and3  ier  :  it£f  thenu€meef(r.r').  r'  =  r'-^it^,  r  =  9 — \t  \  uX.t]\ 

4. -  u  =  glb{t  1  t€.r\Jr']  €.  meet  {r,r' )\ 

5. -  no  other  tuples  are  in  the  meet  (r,r' ). 

The  definition  of  union  is  straightforward.  The  result  of  the  meet  operation 
is  a  relation  instance  which  has  all  the  common  tuples  of  r  and  r'  (from  rule  1). 
In  addition,  it  has  the  more  complete  among  compatible  tuples  from  the  two 
instances  (rules  2  and  3).  Finally,  for  the  remaining  incompatible  tuples,  the 
more  complete  tuple  that  approximates  all  of  them  {gib)  is  in  meet{r,r').  From 
the  above  definitions  it  is  readily  seen  that  r,  r'£itn,ion(r,r')  and  mecf  (r,r' )£r,  r'. 
The  definitions  of  meet  and  union  are  easily  generalized  for  m.any  relations. 
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For  a  tuple  t  =  (vi/Ve,  •  •  •  ,Vy^),  defined  on  a  relation  scheme 
R  =  \Ai,A2>  '  ■  ■  define  a  characteristic  formula,  Ft,  as  follows. 

Ft  :=  (.4  I-V  ^)/\{A2'=V2)/\  •  •  •  /\{An=Vn)  t 
Furthermore,  we  define  the  sets  of  instances  of  R, 

Int  :=  \r  [  j,  with  meet  {lTit)=\t  I 

Outt  :=  Ir  i  Ft{r)=0  with  union(Outt)=DR-\tl. 

These  are  the  sets  of  instances  of  R  where  t  is  present  and  is  not  present  respec¬ 
tively.  A  formula  (e.g.  Ft)  has  been  defined  (chapter  2)  only  on  tuples.  We  abuse 
the  notation  here  by  applying  Ft  on  relation  instances.  Ft{r)  denotes  the 
number  of  tuples  u  in  r,  for  which  Ft{u)—true.  We  are  now  ready  to  define  for¬ 
mally  the  operations  of  insertion  and  deletion  of  tuples  in  instances  of  R. 

The  insertion  operation  is  defined  as  a  function  from  an  instance  of  R  to 
another  instance,  Insertt  :  r->r'.  We  give  the  properties  of  r'. 

(a)  The  new  instance  r'  should  say  no  less  than  the  affirmation  of  the  appear¬ 
ance  of  t.  Formally, 

Ft{T')^\  =>  r'e/n<  => 

(b)  The  new  instance  r'  should  contain  more  information  than  the  old  state  r, 
insertions  always  add  information.  Formally, 

(c)  The  new  instance  r'  should  be  the  least  instance  with  the  above  properties, 
i.e.  no  side  effects  in  an  insertion  eu-e  desired.  Formedly, 

r'  :=  union{r,  [t]) 

Following  King’s  terminology  [King  78],  we  say  that  the  insertion  has  weak 

semantics  if  r’  only  satisfies  property  (a).  Note  that  in  this  case  some  other 

tuples  may  have  been  inserted  in  r  in  addition  to  t.  Furthermore,  some  tuples  of 
t  The  nulls  1?  £uid  W  may  appear  among  the  values  V,. 
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r  may  have  been  deleted.  When  r'  has  both  properties  (a)  and  (b),  we  say  that 
the  insertion  has  medium  semantics.  This  guarantees  that  no  tuples  were  lost 
because  of  the  insertion  of  f.  Finally,  the  insertion  has  strong  semantics  if  r' 
satisfies  sdl  three  properties. 

Similarly,  we  define  deletion  as  a  function  from  relation  instances  to  rela¬ 
tion  instances,  Deletet  :  r->r',  having  the  following  properties. 

(a)  The  tuple  t  should  not  be  in  the  new  instance  r'.  Formally, 

Ft(r)=0  =>  r^Outt  =>  r'tDR-\t] 

(b)  The  new  instance  r'  should  have  less  information  than  r.  Formally, 
r' 

(c)  The  new  instance  r’  should  be  the  maximal  instance  with  the  above  proper¬ 
ties.  Formally, 

r'  :=meef(r,  {DR-[t])) 

It  is  common  to  consider  an  update  as  a  deletion  followed  by  aui  insertion. 
For  such  cases  the  definition  of  update  semantics  is  trivial  in  our  framework.  On 
the  other  hand,  an  update  cauinot  always  be  considered  as  a  deletion  followed  by 
an  insertion  [Bernstein  and  Goodman  80].  An  update,  taken  as  a  transaction, 
requires  the  consideration  of  only  the  initial  and  resulting  relation  instances  for 
consistency.  But  deletions  followed  by  insertions  introduce  other,  intermediate 
instances  whose  consistency  must  be  guaranteed  for  the  operations  to  be 
applied. 

Our  presentation  of  modification  semantics  does  not  consider  the  user’s 
intention  with  respect  to  the  operation.  In  particular,  when  the  null  values  are 
formally  treated  in  the  database,  the  change  of  a  value  may  have  a  different 
semantic  interpretation  than  in  the  absence  of  the  nulls.  It  may  not  be  that  the 
real  world  has  changed,  but  that  the  "knowledge”  of  the  database  about  the  real 
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world  has  changed,  and  this  is  the  change  we  record.  For  example,  we  now  know 
that  John  is  married  and  instead  of  having  'd  for  his  marital-status  we  can  use 
the  value  married.  Hence,  we  can  distinguish  between  two  ways  of  semantically 
interpreting  changes  at  a  high  level.  In  our  presentation  we  make  no  such  dis¬ 
tinction  explicitly. 

Before  the  conclusion  of  this  chapter,  we  discuss  a  generalization  of  a  fun¬ 
damental  assumption  in  database  management  -  the  closed-world  assumption. 
Under  this  assumption,  negative  facts  in  a  database  can  be  deduced  from  the 
positive  facts,  hence  only  the  latter  need  to  be  stored.  In  particular,  consider  a 
relational  database  and  the  interpretation  of  a  tuple  as  a  formula  of  predicate 
logic.  The  tuple  is  false  on  ein  instance  r  if  it  does  not  appear  in  r.  For  example, 
the  fact  that  John  is  not  married  can  be  deduced  from  the  fact  that  the  tuple 
{John,  married)  is  not  in  the  database  (relation).  The  above  assumption  is 
called  the  closed— world  assumption  [Reiter  78].  It  is  contrasted  with  the 
open— world  assumption  which  states  that  all  negative  facts  must  be  stored 
explicitly. 

When  the  null  value  is  allowed  in  a  database,  the  closed-world  assumption 
is  not  correct,  at  least  in  its  present  form.  For  an  illustration,  consider  an 
instance  r  of  R— {name, marital— status)  with  only  one  tuple  t={John,'d).  Sup¬ 
pose  that  we  want  to  establish  the  fact  that  John  is  not  married.  Under  the 
closed  world  assumption,  since  {John,  married)  is  not  in  r,  we  can  deduce  that 
John  is  not  married.  This  is  not  correct.  The 'd  in  the  tuple  {John.'d)  may  very 
well  stand  for  married.  The  correct  version  of  the  closed  world  assumption  in 
our  framework,  is  stated  formally  as 

t  is  not  in  r  iff  there  exists  no  f’€r  : 

Informally,  the  generalized  closed-world  assuption  states  that  we  may 

deduce  with  certainty  that  a  tuple  is  not  in  an  instance  only  when  none  of  its 
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approximations  is  (set-theoretically)  in  the  instance.  Hence  in  the  at>ove  exam- 

if 

pie  we  can  not  deduce  that  John  is  not  married,  but  we  can  deduce  that  Peter  is 
not  married.  • 
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PART  m 


THEORY  OF  FUNCTIONAL  DEPENDENCIES  AND  DATABASE  DESIGN 


Functional  dependencies  are  statements  about  the  real  world.  At  the  same 
time  they  can  be  thought  of  as  constraints  on  the  data  which  represents  the  real 
world  in  the  database.  These  constraints  have  been  formally  defined  and  their 
properties  have  been  investigated  in  relations  with  no  null  values.  Relational 
database  design  is  based  on  the  concept  of  functional  dependencies  and  in  par¬ 
ticular  on  the  existence  of  inference  rules  (Armstrong’s  rules)  for  them  which 
are  sound  and  complete.  In  chapter  4  we  discuss  dependencies  defined  on  rela¬ 
tions  where  null  values  are  allowed.  We  show  how  to  test  for  satisfiability  of 
these  dependencies.  It  is  also  shown  that  Armstrong’s  inference  rules  are  sound 
and  complete  for  functional  dependencies  in  this  framework.  Chapter  5 
discusses  the  universed  relation  model  in  relational  database  design  and  the 
difiiculties  it  introduces.:  Explicit  null  values  are  not  considered  in  this  context. 
We  show  that  most  of  the  problems  introduced  by  the  universal  relation  assump¬ 
tion  are  eliminated  when  a  weaker  notion  of  the  assumption  is  considered.  This 
requires  the  theory  developed  in  chapter  4. 


Chapter  4 


Functional  Dependency  Extensions 


Thou  hast  seen  nothing  yet 
-  Cervantes 


4.1.  Introduction 

Work  on  relational  database  design  started  soon  after  the  publication  of  the 
pilot  papers  on  the  relational  model  [Codd  70],  [Codd  72].  Normalization,  which 
is  a  relational  schema  design  process,  centers  around  the  notion  of  data  depen¬ 
dencies,  a  purely  syntactic  notion,  that  has  been  introduced  to  capture  seman¬ 
tics  in  a  relational  database.  These  dependencies  are  used  as  guidelines  for  the 
design  of  a  relational  schema  which  is  conceptually  meaningful  and  is  free  of 
certain  update  anomalies  [Date  77].  Furthermore,  dependencies  are  integrity 
constraints  on  the  resulting  schema.  As  such,  they  must  be  satisfied  in  any 
instance  of  the  database.  The  theory  of  dependencies,  in  particular  of  the  func¬ 
tional  ones,  has  been  studied  in  depth  [Beeri  et  al  78]. 

Data  dependencies  are  defined  in  a  context  of  no  nulls.  In  order  to  allow  for 
nulls  we  must  carefully  redefine  dependencies,  more  precisely  their  interpreta¬ 
tions,  together  with  their  requirements  of  satisfiability  and  inference  rules.  The 
satisfiability  requirements  give  us  a  pattern  of  allowable  nulls  in  a  (universal) 
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relation.  We  note  here  that  the  nothing  null  value  which  denotes  inconsistencies 
may  in  general  appear  in  a  database,  but  it  has  no  place  in  a  database  where 
semantic  rules  are  required  to  be  valid.  Hence,  in  the  present  context  we  only 
consider  missing  data  values. 

In  this  chapter,  we  extend  the  notion  of  a  functional  dependency  (FD) 
interpetation  to  apply  to  nulls.  We  then  present  satisfiability  requirements  and 
give  necessary  and  suflficient  conditions  under  which  these  requirements  are 
met.  Finally,  the  properties  of  FDs  are  examined  and  inference  rules,  which  are 
shown  to  be  sound  and  complete,  are  presented.  We  claim  that  only  after  having 
such  results  is  it  conceivable  to  talk  safely  about  decompositions  eind  normaliza¬ 
tion  theory  when  nulls  are  allowed  in  relation  instances. 

Functional  dependencies  are  discussed  in  section  4.2.  In  the  third  section 
of  this  chapter,  the  interpretation  of  the  dependencies  is  redefined  to  apply  to 
nulls,  and  conditions  are  presented  for  the  FDs  to  meet  satisfiability  require¬ 
ments.  Section  4.4  examines  properties  of  FDs,  specifically  inference  rules.  In 
this  section  we  depart  from  our  system  to  work  with  an  equivalent  but  well- 
axiomatized  propositional  logic  system.  The  technique  is  the  seime  as  the  one 
used  in  [Fagin  77].  Ways  of  efficiently  testing  for  FD-satisfiability  are  presented 
in  section  4.5.  In  addition,  the  notion  of  a  "minimally  incomplete"  relation 
instance  is  introduced,  and  the  rules  for  reaching  such  an  instance  are  shown  to 
constitute  a  finite  Church-Rosser  system.  In  the  concluding  remarks  of  this 
chapter  we  summarize  the  results  and  discuss  their  importance. 

4.2.  Fimctioual  Dependencies  and  their  Interpretation 

Let  /?  be  a  relation  scheme  and  X,  Y  be  sets  of  attributes  in  R  (not  neces¬ 
sarily  distinct).  A  functional  dependency  (FD)  denoted  hy  f:  X  Y,  or  simply/, 
is  a  statement  about  R.  For  example,  consider  the  relation  scheme  in  figure  4.1 
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and  the  statement:  "Employees  have  only  one  salary  and  work  in  only  one 
department."  The  expression  of  this  semantic  rule  is  the  functional  dependency: 
E§  ->  SL,D§.  Accordingly,  the  interpretation  of  an  FD  is  o. -predicate  on  instances 
of  R  defined  as: 


fir)  = 


true 


false 


if  for  every  inr,  either  i  [A"], 

or,  if  then  £ [F]  =  F  [F] 

otherwise 


We  say  that  /  holds  (or  is  satisfied,  or  is  true)  in  a  relation  instance  r  if  /(r)  is 
equal  to  true.  Furthermore,  /  is  valid  for  R  if  it  holds  in  all  its  instances.  It  is 
trivial  to  verify  that  the  functional  dependencies  E§  ->  SL.Djj:  and  Df  CT 
hold  in  the  instance  r  of  figure  4.2. 


For  convenience,  we  now  modify  our  notation  to  have  /  defined  as  a  func¬ 
tion  with  two  arguments,  a  tuple  and  a  relation  instance.  Kence, 


true 


f{t,r)  = 


if  for  every  V  in  r,  either  t 

or.  ift[X]=t'[X].  then  3{[F]=F[F] 


false  otherwise 

and  say  that  f  holds  in  r  if  for  every  i  in  r  f(t.r)  —  true. 

When  a  set  of  FDs  hold  in  a  relation  instance  r,  there  are  usually  some  other 
dependencies  that  also  hold  in  r.  More  formally,  a  functional  dependency  f  is 
implied  by  a  set  of  FDs  F  =  |  / 1,/ 2. •••./* i  if  there  is  no  counterexample  relation 
r\  such  that  F  holds  in  r’  but  /  does  not  hold  in  r'.  A  very  important  result  which 
constitutes  the  basis  for  much  research  on  FDs  is  Armstrong's  inference  rules 
which  are  sound  and  complete  for  functional  dependencies  (figure  4.3). 


We  now  make  two  observations  commonly  used  for  proofs  about  FDs  (some¬ 
times  implicitly).  Let  2T  ■=  \  s  j  sQr,  \s  |  =2  ^  (i.e.  2T  is  the  set  of  all  the  two- 
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Figure  4. 1  An  exeimple  relation  scheme  and  functional  dependencies. 

R{Ei^.  SL,  D§.  CT) 

Attribute  explanation 

E§  -  employee  serial  number 
SL  -  salary 

Djf:  -  department  number  of  the  employee 
CT  -  contract  tj^e. 

Functional  dependencies  defined:  /j  :  E§  ->  SL,Dff  and  /g  :  Djf^  -»  CT 


Figure  4.2  An  instance  of  the  relation  scheme  R. 
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Figure  4.3  Sound  inference  rules  for  FDs. 


[FDl]:  (reflexivity)  if 

[FD2]:  (augmentation)  if 

[FD3]:  (transitivity)  if 

[FD4]:  (pseudo-transitivity)  if 

[FD5]:  (union)  if 

[FD6]:  (decomposition)  if 


YQX  then  X-^Y 
ZQW  and  X-^Y  then  XW^YZ 
X-^Y  and  Y^Z  then  X-^Z 
X-^Y  and  YW^Z  then  XW^Z 
X-^Y  and  X^Z  then  X-^YZ 
X-^YZ  then  X^Y  and  X->Z 


fFDl,FD2,FD3j  and  [FD1,FD3,FD5,FD6^  are  complete  sets  of  rules. 


Figure  4.4  An  instance  of  R  with  nulls. 
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tuple  subrelations  of  a  relation  instance  r  ). 

[1]  The  functional  dependency  /  holds  in  r  iS  /  holds  in  every  (two-tuple)  rela¬ 
tion  in  2T. 

[2]  The  functional  dependency  /  is  implied  by  a  set  of  functioned  dependencies 
F  ]Rfis  implied  by  F  in  the  world  of  two-tuple  relations. 

These  observations  allow  us  to  consider  only  two-tuple  relations  in  proofs 
about  functional  dependencies  without  loss  of  generality.  We  will  see,  however, 
that  these  observations  are  not  always  correct  when  null  values  are  allowed. 


4.3.  Functional  Dependencies  in  Relations  with  Null  Values 

From  this  point  on,  we  assume  that  nulls  are  allowed  in  relation  instances. 
An  example  of  such  an  instance  is  figure  4.4.  To  extend  the  notion  of  a  func¬ 
tional  dependency  (more  precisely,  its  interpretation  as  a  function)  we  use  the 
least  extension  rule.  Let  fiX^Yhe  a  functional  dependency,  XYQR,  r  be  a  rela¬ 
tion,  and  t  a  tuple  in  r. 


fU.r) 


f°{.t,r)  = 


if  all  values  in 
t[XY\  r[Xr] 
are  not  null 


lub\f(t\F)  \  r'eCOMPL{r[XY]),  t' eCOMPL{t[XY])]  otherwise 


We  use  f°  to  denote  the  extension  of  /,  but  from  now  on  we  will  drop  the  for 
simplicity. 

The  above  definition  is  refined  on  a  case-by-case  basis  (considering  the  null 
as  one  of  the  t\X^  values  or  as  one  of  the  t\Y'\  values)  to  establish  necessary  and 
sufficient  conditions  for  an  FD  to  take  a  particular  truth  value.  Before  we 
present  these  conditions  formally,  we  give  an  informal  explanation.  Recall  that 
the  value  of  f{t,r),  with  a  null  appearing  in  t\^XY\  is  false  (true)  only  when  it 
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evaluates  to  false  ( true)  for  all  substitutions  of  the  null  value.  For  the  discus¬ 
sion  below,  the  concept  of  a  finite  attribute  domain  and  its  size  is  important. 
Further  restrictions  on  domains  will  be  presented  and  justified. 

Assume  first  that  the  null  appears  in  f[F]  and  that  Fhas  only  one  attribute. 
Trivially,  f{t,r)  =  true  whenever  t\X^  appears  uniquely  in  r.  When  f[A]  is  not 
unique  in  r,  say  f[Ar]=  t'\X'\  for  some  tuple  f'  in  r,  we  may  not  claim,  that 
f  {tyv)  =  true.  It  is  possible  to  substitute  the  null  in  f  [F]  so  that  f{t,r)  is  false 
with  the  substituted  value  (e.g.  any  value  that  makes  i  [F]).  However,  we 

also  may  not  claim  that  f  {t,r)= false  since  we  can  make  it  true  if  we  substitute 
i[F]  with  ^'[F].  Hence,  depending  on  how  we  substitute  for  the  null  the  FD  is 
either  true  or  false.  Since  we  do  not  know  what  the  actual  value  of  the  null  is 
we  take  the  lub  [true,  false  ]=unknown  as  the  value  of  the  FD  on  t  and  r. 

Assume  now  that  the  null  appears  in  f  [A"]  (X  can  consist  of  more  than  one 
attribute).  The  dependency  will  evaluate  to  true  in  two  cases.  First,  there  is  no 
tuple  in  r  whose  projection  on  A'  is  a  completion  of  f[A]  (i.e.  anytime  we  substi¬ 
tute  for  the  null  we  end-up  with  a  unique  t\_X'\  among  the  tuples  of  r).  In  this 
case,  f  (t,r)=true  trivially.  Second,  for  each  tuple  t’  in  r  such  that  f'[A’]  is  a 
completion  of  t\X\  we  also  have  ^'[F]=f[F].  There  is  exactly  one  case  where 
the  value  of  the  FD  on  t  and  r  is  false  because  of  a  null  in  f  [A"].  This  case  arises 
when  we  run  out  of  domain  values,  while  attempting  substitutions  of  this  null, 
while  at  the  same  time  trying  to  keep  the  dependency  not  false,  i.e.  a  substitu¬ 
tion  for  which  the  dependency  is  true.  For  this  to  happen,  it  must  be  the  case, 
first,  that  all  completions  of  t  appear  in  r.  (Otherwise,  we  may  substitute  for  the 
null  and  create  a  completion  that  does  not  appear  in  r,  thus  ensuring  that 
f  {t,r  false.)  In  addition,  since  for  all  such  completions  we  must  ensure  that 
the  dependency  is  false,  it  is  required  that  the  f[F]  value  is  unique  among  all 
the  r[F]  values  where  i'  is  a  completion  of  t  that  appears  in  r.  This  is  the  only 
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case  where  Ql  null  in  a  tuple  t  makes  the  value  of  f{t,r)  identically  equal  to 
false.  F ormally: 

Proposition  4,1 

Let  i?  be  a  relation  scheme,  X,  YQR,  such  that  Xr\Y=tji  and  X'dY-R, 
f:  X  Y  be  a  functional  dependency  in  R,  r  be  an  instance  of  R,  and  t  a  tuple 
of  r.  Assume  that  r  —  \t]  has  no  nulls.  Alternatively,  consider  all  completions  of 
r-\t  \  iteratively.  In  addition,  assume  that  the  number  of  tuples  in  r  is  greater 
than  one  (non-trivial  cases). 

f{t,r)  —  true  ifC  one  of  the  following  conditions  holds 

[Tl]  t\XY^  has  no  nulls  and  there  exists  no  F  in  r  such  that 

f'[A’]=f[A']  andi'[r]?if[r]. 

[T2]  i[F]  has  a  null,  has  no  nulls  and  there  exists  no  V  in  r  such  that 

tiX]  =  f[X]. 

[T3]  t\X^  has  a  null,  f[F]  has  no  nulls  and  either  no  completion  of  t\_X]^  is  in  r, 
or  if  a  completion  of  t[X^  is  in  r,  say  t'  [X ],  then  i [1'']=^'  [Y ]. 

f{t,r)  =  false  ifi  one  of  the  following  conditions  holds 

[Fl]  t[XY ]  has  no  nulls  and  there  exists  a  tuple  V  in  r  such  that  t\X^=t'\X\  and 

[F2]  t\X^  has  a  null,  t  [F]  has  no  nulls  and  both: 

a. -  all  completions  of  t[X^  appear  inr, 

b. -  f[F]  is  unique  among  all  those  completions. 

f{t,r)  =  unhnown  otherwise. 


Examples  of  the  above  are  given  in  figure  4.5.  We  say  that  a  functional 
dependency  /  {strongly)  holds  in  an  instance  r  if  f  {t,r)=tTvie  for  every  tuple  f  in 
r.  In  addition,  we  say  that  a  functional  dependency  /  weakly  holds  in  an 
instance  r  if  f{t,r)^ false  for  every  tuple  t  in  r.  The  second  notion  of 
satisfiability  is  justified  intuitively  since,  in  a  framework  of  incomplete  informa¬ 
tion,  it  is  natural  to  weaken  our  expectations  and  allow  for  a  margin  of  uncer¬ 
tainty  in  our  semantic  rules  (as  long  as  this  does  not  lead  to  a  certain  denial  - 
contradiction  -  of  the  constraint). 

Before  we  conclude  this  section,  let  us  again  consider  the  two  observations 
we  made  in  the  previous  section  about  two-tuple  relations.  The  observations 
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Figure  4.5  Exeimples  of  FDs  with  nulls 

Consider  the  relation  scheme  R{A,  B,  C),  the  FD  f  :  AB  ^  C,  and  the  fol¬ 
lowing  four  instances  of  R. 


We  use  Proposition 
T'l)  = 

/  1.  T-g)  =  true 

f(ii>  fs)  =  true 


4.1  (^1  represents  the  first  tuple  in  each  instance), 
because  of  [T2] 
because  of  [T3] 
because  of  [T3] 


Assume  that  for  the  instance  r4  the  domain  of  A  has  only  two  values:  a\,  ag. 
f{t\,  T4)  =  false  because  of  [F2] 
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allow  us  to  determine  whether  a  dependency  holds  or  is  inferred  by  just  looking 
at  two-tuples.  It  can  be  trivially  verified  that  both  observations  eure  valid  when 
we  consider  the  strong  version  of  FD  satisfiability. 

On  the  other  hand,  they  are  both  false  when  the  weak  notion  is  considered. 
A  counter-example  of  the  first  observation  is  given  in  the  last  instance  r4  which 
appears  in  figure  4.5.  Note  that  any  two-tuple  combination  in  r4,  considered 
independently,  makes  the  FD  /  not  false.  But  the  dependency  is  false  in  the 
whole  relation.  To  ensure  that  the  observations  are  valid  for  both  notions  of 
satisfiability  we  require  that  tuples  containing  nulls,  which  make  a  dependency 
false  for  every  possible  substitution,  do  not  appear  in  r.  The  test  to  find  these 
tuples  is  very  hard,  being  domain-dependent.  On  the  other  hand,  we  now  argue 
that,  in  practice,  it  is  unlikely  that  such  tuples  will  appear  in  a  database.  For 
the  simple  case,  where  X  has  only  one  attribute,  this  argument  is  intuitively 
justified.  The  "bad"  case  [F2]  of  proposition  4.1  requires  all  the  domain  values  of 
the  X  attribute  to  be  in  r  and  any  tuple  which  has  a  null  for  X  to  disagree  in  the 
Y  values  with  all  tuples  in  r.  This  amounts  to  the  requirement  that  the  number 
of  actual  determining  objects  is  smaller  than  the  number  of  determined  objects. 
For  example,  it  would  mean  that  a  company  gives  more  salaries  than  the 
number  of  employees  it  actually  has!  In  a  carefully  designed  database  we  would 
expect  the  domain  of  employee  numbers  to  be  sufficiently  large  -  say,  larger 
than  the  maximum  number  of  tuples  that  may  be  inserted  in  the  relation. 
Unfortunately,  we  can  not  apply  our  intuition  as  smoothly  when  X  has  more 
attributes.  After  considering  inference  rules  for  FDs  in  the  next  section,  we 
present  in  section  4.5,  ways  for  efficiently  testing  weak  and  strong  satisfiability 


for  a  set  of  FDs  in  a  relation  r. 
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4.4.  Inference  Rules  for  Functional  Dependencies 

One  of  the  major  applications  of  FDs  is  in  the  theory  of  normalization  and 
schema  design.  In  this  section  we  will  show  how  normalization  theory  and  rela¬ 
tional  schema  design  can  be  applied  in  the  presence  of  incomplete  information. 
The  examination  of  inference  rules  between  FDs  and  in  particular  the  establish¬ 
ment  of  sound  and  complete  inference  rules  are  of  prime  importance  for  this 
purpose.  For  the  sake  of  simplicity  in  proofs  (especially  completeness  proofs) 
we  will  obtain  our  results  by  reduction  to  a  system  which  is  equivalent  to  our 
system  of  extended  FDs.  We  will  first  show  an  equivalence  between  our  system 
and  a  well-axiomatized  propositional  logic  system.  The  equivalence  carries 
between  functional  dependencies  in  our  system  and  implication  statements  in 
the  propositional  logic.  This  equivalence  will  allow  us  to  conclude  that  rules 
which  are  sound  and  complete  for  the  implication  statements  have  the  same 
property  for  functional  dependencies.  Our  approach  is  similar  to  that  of  [Fagin 
77],  but  in  a  different  environment. 

System-C  is  a  propositional  logic  system  for  unknown  outcomes  [Bertram 
73].  It  is  a  modal  system  which  is  not  truth-functional.  A  unary  operator,  V, 
which  reads  as  "necessarily  true"  is  added  to  the  traditional  operators  of  nega¬ 
tion,  disjunction,  etc.  C  has  been  axiomatized  and  a  set  of  axioms  appear  in  the 
appendix.  A  detailed  justification  and  explanation  of  the  axioms  is  beyond  the 
scope  of  this  thesis.  We  only  note  that  some  of  the  axioms  comprise  a  set  of 
axioms  for  classical  two-valued  logic,  thus  ensuring  that  everything  provable  in 
two-valued  logic  is  also  provable  in  C.  The  rest  of  the  axioms  give  to  C  the  modal 
interpretation  and,  in  particular,  the  last  axiom  restricts  C  to  a  system  of  "logi¬ 
cal  necessity". 

C  has  an  unusual  evaluation  scheme  f  that  uses  the  notion  of  two-valued 

tautologies.  Let  P{pi,  P2>  -  Pk)  t>e  a  well-formed  formula  {wff)  in  C, 

t  An  evaduation  scheme  is  a  fiinction  V  from  the  set  of  propositional  variables  to  the  set  of 
truth  values  { f7~ue,  false,  unknowTil. 
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expressed  in  terms  of  its  atomic  terms,  and  a  =  fofi  1  i  —  1,  an  assignment 

of  truth  values  to  pi,  p^,  •  •  ■  •  Pn-  The  evaluation  of  P  under  a,  denoted  by 
V{P,  a)  or  simply  V{P)  where  a  is  understood,  is  defined  by  the  following  recur¬ 
sive  rules: 

1. -  If  P  is  a  tautology  in  the  classical  two-valued  logic  then  V{P)  =  true 

2. -  if  for  some  2,  then  K(P)  =  cXi 

3. -  if  P  =-’  Q  then 


V{P)  = 


true 

false 

unknown 


if  K(Q  )  =  false 
if  F(Q  )  =  true 
otherwise 


4.-  \iP  -  Q  A.  S  ihsn 


V(P)  =  • 


true 

false 

unknown 


if  V(Q)  and  V(S)  are  true 
if  7(Q)  or  V(S)  is  false 
otherwise 


5.-  if  P  =  VQ,  then 


true  if  F(Q)  =  true 

ViP^  =  ' 

'■  '  \false  otherwise 

Rule.l  is  always  applied  first  and  it  is  the  reason  that  C  is  not  truth- 
functional.  The  example  here  is  p  \/^  p.  It  is  a  two-valued  tautology,  thus  having 
the  value  true  in  C.  But  if  evaluated  without  rule.l  it  has  the  value  unknown. 
Notice  also  that  C  makes  reductions  before  evaluating  formulas.  Thus, 
ip Aq)\/{p A—q)  will  be  recognized  to  be  equal  top  before  its  evaluation. 

Lemma  4.1 

The  function  V  can  be  derived  as  the  least  extension  of  the  evaluation  func¬ 
tion  F  in  a  classical  two-valued  logic  system. 

Proof 

(Sketch)  It  can  be  easily  verified  that  the  truth  tables  based  on  the  least  exten¬ 
sion  rule  are  the  same  as  the  truth  tables  of  V.  Of  course  the  former  tables  are 
only  used  for  primitive  terms.  However,  observe  that  they  can  also  be  used  for 
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general  wfis  with  only  one  exemption  -  tautologies  in  two-valued  logic.  The  same 
applies  for  V  and  is  taken  ceire  of  by  rule.l.  The  operator  V  is  not  defined  in  clas¬ 
sical  logic.  Therefore  its  evaluation  rule  is  arbitrary.  ■ 

Corollary  4. 1 

The  system  we  are  using  and  C  are  equivalent  in  that  they  have  the  same 
evaluation  scheme.  ■ 

A  C-tautology  is  a  C  wfT  which  takes  only  the  value  true  (under  V).  A  C- 
theorem  is  a  wff  that  is  derived  from  the  axioms  of  C.  In  [Bertram  73]  it  is  pro¬ 
ven  that,  given  the  particular  evaluation  scheme  V,  every  C-tautology  is  a  C- 
theorem  and  vice-vers'a  (soundness  and  completeness).  The  reason  for  introduc¬ 
ing  C  will  now  become  apparent. 

Implication  is  defined  in  the  regular  way:  P  =>  Q  :=  P\/  Q.  We  will  con¬ 

sider  a  special  type  of  implicational  statement.  Let  A,  B,  denote  proposi¬ 
tional  variables,  X,  Y,  Z  denote  conjunctive  terms  of  propositional  variables,  i.e. 
X  =  A  /\  B  or  simply  X  =  AB.  The  implicational  statements  of  interest 
(denoted  by  /  )  have  the  form  X->Y.  Notice  the  similarity  with  functional 
dependencies.  From  now  on  we  will  use  the  term  "implicational  statement  "  for 
any  statement  of  this  form.  An  implicational  statement  /  is  logically  inferred 
by  a  set  of  implicational  statements  F  =  \f  fn]  if  for*  every  assign¬ 

ment  of  truth  values  a  that  gives  to  all  fi  in  F  the  value  tr-ue,  a(f)  is  also  true. 
Similarly,  we  can  define  the  notion  of  weak  logical  inference  where  we  relax 
our  requirements  by  having  a(/)  ^  false. 


-  75  - 


Lemma  4.2  (Implicational  Completeness) 

The  following  inference  rules  are  sound  and  complete  for  implicational 

statements  in  C. 

[11]  if  VOX  then  _  _ 

[12]  ifA'  =  >}''  and  Y=>Z  then  X—>Z 

[13]  ifX  =  >r  and  X  —  >Z  then  X  =  >YZ 

[14]  ifX=>YZ  then  X=>Y  and  X->Z 

Proof 

(a)  Soundness 

,  Trivially,  all  the  rules  hold  for  C  since  everything  provable  in  traditional 
two-valued  logic  (and  the  rules  are  provable)  is  also  provable  in  C. 

(b)  Completeness 

We  want  to  show  that  given  a  set  of  implicational  statements 
^  -  \f  •  •  •  .  fn]  the  rules  produce  all  the  logical  inferences.  We  use  a  con¬ 


tradiction. 

Assume  that  X=>Y  is  a  logical  inference  of  F,  but  it  cannot  be  derived 
from  F  via  the  inference  rules.  Let  Cl{X)  denote  the  set  of  all  propositional 
variables  (and  conjuctive  terms  of  them)  Z  ,  such  that  X  =  >Z  can  be  proved 
from  F  via  the  rules.  The  set  Cl{X)  is  not  empty  since  XcCL{X).  Our  assump¬ 
tion  (to  be  contradicted)  is  that  YYCl{X).  All  we  have  to  do  is  to  find  a  particu¬ 
lar  truth  assignment  under  which  all  the  statements  in  F  are  true  but  the  state¬ 
ment  X=>Y  is  false.  Note  that  since  the  truth  assignment  is  arbitrary,  it  is 
never  necessary  to  assign  the  truth  value  unknown  to  the  propositional  vari¬ 
ables. 


Consider  the  truth  assignment  a  defined  as  follows; 


a(A  )  = 


true 

false 


if  A&Cl{X) 
if  A^Cl{X) 


Under  the  truth  assignment  cx.  the  value  of  X=>Y  is  false  since,  by 


definition  of  implication,  if  Y  is  false  and  X  is  true  then 


cx(A'=>U)  =  false. 
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On  the  other  hand,  we  now  show  that  all  implicational  statements  in  F  take  the 
value  true  under  a. 

Let  W->Z  be  an  arbitrary  statement  iriF.  There  are  two  cases: 

1. -  W^Cl(X) 

In  this  case,  by  definition  of  Cl{X),  the  statement  X=>W  can  be  proved 
from  F  via  the  rules.  Consequently,  by  application  of  transitivity  (rule  [12]). 
X=>Z  can  be  proved  via  the  rules.  Hence,  Z€.Cl{X).  Finally,  since 
a(ir)  =  true  and  ol{Z)  =  true  we  have  a(W  =  >Z)  =  true. 

2. -  W^Cl{X) 

There  must  be  an  A  in  W  such  that  a(A)  =  false.  Consequently, 
=  false.  By  definition  of  implication  a(W=>Z)  =  true.  ■ 

As  we  noted  before,  implicational  statements  resemble,  syntactically,  func¬ 
tional  dependencies.  With  the  above  lemma  we  establish  a  set  of  inference  rules 
that  produce  all  and  only  implicational  statements.  We  now  proceed  to  the 
major  result,  which  is  to  show  that  the  resemblance  with  functional  dependen¬ 
cies  is  not  coincidental.  Rather,  it  is  an  equivalence. 

Lemma  4.3 

Let  0£  be  an  assignment  of  truth  values,  s  =  \t,t']  a  two-tuple  relation,  X ~>Y  a 
functional  dependency  and  X-=>Y  the  corresponding  implicational  statement. 
For  every  A  in  suppose  the  following  holds: 

i  [A  ]  =  t'\A'\  iff  tx(4_)  =  true 

i  [A  ]  ^  [A  ]  iff  a(A )  =  fals^ 

i  [A  ]  or  F  [A  ]  =  null  iff  cx(A  )  =  unknown 

then:  X-^Y  strongly  holds  in  s  ifi  a{X  =  >Y)  =  true 
Proof 

(a)  if  _  _ 

The  assumption  here  is  that  a{X=->Y)  =  true.  It  must  be  that  we  have  one 
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of  the  two  cases. 

1. -  a{X)  —  false. 

Then  there  exists  an  i4  in  X  such  that  a{A  )  =  false.  Hence  t\A  ]  and 

the  FD  X-^Y  holds  in  s. 

2. -  a{Y)  =  true. 

Then  for  every  A  in  Y,  a(i4)  =  true.  Hence,  for  every  corresponding  A  we 
have  f[i4]=i’[A].  That  is,  the  FD  is  not  violated  irrespective  of  the  X  values 
in  the  two  tuples. 

(b)  only -if 

Assume  now  that  the  FD  X^Y  holds  in  s.  According  to  Proposition  4.1  it 
must  be  that  we  have  one  of  the  three  cases. 

1. "  [Tl]  no  nulls  in  i [AT']  and  t'\XY^ 

In  this  case,  either  t\X'\^V\X^  or  t\X'\=t'\X'\  and  also  ^[F]=i'[r]. 
For  the  former,  there  must  be  an  A  in  A"  such  that  .  Conse¬ 

quently,  a(A )  =  false  and  ci!(A=>F)  =  true.  For  the  latter,  both  X  and  Y 
are  true  under  a.  Therefore,  a(X=>Y)  =  irnte. 

2. -  [T2]  (reduces  trivially  to  1.) 

3. -  [T3]  Again  reduces  to  the  two  subcases  of  1.  Notice  that  we  consider  only 

one  null  at  a  time.  Hence,  if  t\X^  has  a  null  and  ^'[A]  is  not  a  completion 
of  i[A],  then  no  matter  what  value  is  substituted  we  always  have  t[Xf\^t'\Xf\ 
(i.e.  the  reduction  to  1).  » 

Lemma  4.4 

Consider  the  world  of  two-tuple  relations.  A  functional  dependency  A-^F  is 
inferred  from  a  set  of  FDs  F  iff  A=>F  is  a  logical  inference  of  F. 


-  78  - 


Eraof 
(a)  only~if 

Let  the  FD  f:X^Y  be  inferred  from  F  and  assume  that  the  corresponding 
statement  X=>y'  is  not  a  logical  inference  of  F.  We  ■will  derive  a  contradiction. 
All  we  have  to  do  is  find  a  two-tuple  relation  where  all  the  dependencies  in  F  hold 
but  X^Y  can  not  hold.  Since  X=>Y  is  not  a  logical  inference  of  F,  there  must 
be  an  eissignment  of  truth-values  cx,  such  that  a(/i)  =  true  for  every  in  F, 
but  a(/)  ^  true  (either  false  or  unknown).  Consider  the  two-tuple  relation 
s  =  f  j  constructed  as  follows: 


t\Ai\  =  1  for  every!, 
V  \Ai\  =  hi  where 


h 


i 


0  if  a(Ai)  =  true 
1  otherwise 


Notice  that  all  dependencies  in  F  hold  in  s.  Therefore  f  must  also  hold.  The 
above  construction  is  in  accordance  with  lemma  4.3  Hence  all  corresponding 
implicational  statements  in  F  are  true.  The  obvious  contradiction  is  that  /  is 
also  true. 


(b)  if 

The  proof  is  again  by  contradiction  and  is  very  similar  to  the  only-if  part. 
The  key-point  is  the  construction  of  a  truth-assignment  such  that  in  any 
corresponding  two-tuple  relation  where  F  holds,  the  FD  f  holds  (contrary  to  the 
assumption  that  it  cannot  be  inferred  from  F).  ■ 

The  above  two  lemmas  show  that  there  is  an  equivalence  between  functional 
dependencies  (with  null  values)  and  strong  satisfiability  in  two-tuple  relations 
and  implicational  statements  in  C.  As  we  have  sho'wn  from  section  4.3  the  "two¬ 
tuple  relation"  restriction  is  only  necessary  for  technical  reasons  in  the  case  of 
strong  satisfiability.  Because  of  this  equivalence  and  lemma  4.2,  the  following  is 
a  trivial  consequence. 
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Theorem  4.1 

Armstrong’s  inference  rules  are  sound  and  complete  for  functional  depen¬ 
dencies  defined  on  relations  with  nulls  and  the  requirement  of  strong 
satisfiability.  ■ 

With  the  result  of  theorem  4. 1  we  may  safely  talk  about  decompositions  and 
the  theory  of  normalization  applying  even  when  nulls  are  allowed  in  relation 
instances.  As  was  demonstrated  in  section  4.3,  we  cannot  claim  the  same 
result  in  the  case  of  weakly  satisfiable  dependencies  (where  we  accept  a  depen¬ 
dency  as  long  as  it  is  guaranteed  not  to  be  false).  On  the  other  hand,  if  we 
impose  the  state  and  domain-dependent  condition  on  allowable  nulls,  we  show  in 
the  next  section  that  the  result  holds  for  weak  satisfiability  in  relation  instances 
which  we  call  "minimally  incomplete". 

4.5.  Satisfiability  for  a  Set  of  Functional  Dependencies 

In  section  4.3  we  discussed  satisfiability  for  a  single  functional  dependency. 
When  we  have  a  set  of  dependencies  F,  no  dependency  in  F  can  be  tested  for 
weak  satisfiability  independently  from  the  others.  Consequently,  Armstrong’s 
inference  rules  do  not  hold.  The  following  example  illustrates  this  fact.  Con¬ 
sider  the  relation  scheme  R{A,  B,  C),  the  FDs:  f^'B^C,  and  the 

instance  r: 


r 

A 

B 

C 

ai 

ai 

T? 

Cl 

Cs 

Figure  4.6 


The  functional  dependencies  f  \  and  /g  evaluated  independently  on  r  take 
the  value  unknown  (they  are  weakly  satisfied).  This  is  not  the  case  when  the 
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dependencies,  are  evaluated  simultane ously.  For  B  -*C  to  hold  in  r,  it  must  be 
that  the  two  B-values  in  r  are  distinct.  Therefore,  A  is  false.  Informally, 
when  an  FD  is  satisfied  in  r.  something  more  may  be  known  about  the  possible 
values  that  the  nulls  in  r  represent.  Hence,  the  assumption  that  a  null  can  be 
substituted  with  any  domain  value  is  not  valid.  This  section  deals  with  the  above 
shor  tc  oming  s. 

First,  we  informally  discuss  rules  that  guide  us  in  "correct"  substitution  of 
nulls  according  to  the  FDs.  The  formalization  of  the  rules  and  the  examination 
of  their  properties  is  presented  in  the  sequel.  A  null  may  be  substituted  only  if 
there  is  exactly  one  option  making  the  dependency  true.  For  instance,  if  the 
null  appears  among  the  t\Y]  values,  t\X]  has  no  nulls  and  there  is  a  tuple  t'  in  r 
with  f [A!']=<’[A!'],  we  may  substitute  the  t[Y]  value  with  the  f'[F]  value.  The 
justification  of  this  substitution  is  two-fold.  First,  the  resulting  tuple  has  more 
information  them  the  previous  one.  Second,  and  more  important,  this  new  infor¬ 
mation  is  not  arbitrary  -  it  is  the  only  piece  of  information  that  makes  the 
dependency  true.  The  value  which  is  substituted  is  the  only  vailue  that  a  user 
can  insert  without  the  creation  of  an  inconsistency.  For  substituting  nulls  in 
f[Ar]  the  rule  is  more  complicated  and,  unfortunately,  is  domain-dependent. 
One  of  the  following  two  conditions  must  be  met  for  such  a  substitution  to  take 
place. 

(1)  All  completions  of  t\X'\  appear  in  r,  t[Y]  is  not  null,  and  there  exists 
exactly  one  completion  of  t[X^  in  r,  say  t'\X\  such  that  f'[F]=f[F].  The 
null  in  f  [A’]  may  be  substituted  ■with  the  corresponding  value  in  V  \_X\ 

(2)  All  completions  of  t{X'\  appear  in  r  except  one,  f  [F]  is  not  null,  and  for  all 

tuples  V  in  r,  such  that  f'CA"]  is  a  completion  of  t'\Y]  has  no  nulls  and 

is  distinct  from  f  [F].  The  null  in  f  [A']  may  be  substituted  ■with  the  value  of 
the  domain  of  X  that  does  not  appear  in  r. 
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Neither  condition  is  easy  to  test.  In  addition  they  seem  unlikely  to  occur. 
For  practical  reasons,  it  may  be  better  to  leave  the  database  incomplete,  that  is, 
prohibit  substitutions  of  nulls  in  t  [X]. 

In  addition  to  substituting  for  nulls  in  f[F]  we  may  find  relationships 
between  nulls.  Not  all  nulls  represent  distinct,  missing  values.  For  the  FDs  to  be 
satisfiable,  certain  nulls  must  be  "equal"  to  other  nulls.  To  guide  us  in  equating 
nulls,  we  introduce  a  new  type  of  constraint,  f 

Definition  4.1 

k  Null— Equality —Constraint,  {NEC),  is  a  statement  to  the  effect  that  two 
null  values  are  equal  (i.e.  must  take  the  same  value  in  any  substitution).  For  tu¬ 
ples  ti,  tj,  and  attributes  B,  a  NEC  is  denoted  as:  NEC:  ]:=fy[J9  ].  ■ 

Null  equality  constraints  introduce  equivalence  classes  for  null  values.  We 
now  formalize  the  rules  that  allow  for  non-arbitrary  increase  of  knowledge  about 
null  values  in  a  relation  where  FDs  are  defined. 

Definition  4.2 

Given  a  relation  R,  a  functional  dependency  X^Y  embedded  in  R,  and  an 
instance  r  of  R,  the  Null -Substitution -Rule  (NS-rule)  corresponding  to  the  FD 
X-^Y  is:  If  for  two  tuples  tj  in  r  we  have  ti\Xl^=tj\X)^^null  or  the 
NEC:  til^Xy^^tjlX^  then: 

(a)  if  only  one  of  fi[r],  is  null,  then  this  null  is  substituted  with  the  non¬ 

null  value  of  the  other; 

(b)  if  both  fi[F]  and  f/[F]  are  null,  then  the  null  equality  constradnt 
NEC  :ti\Y \.—tj\Y ]  is  introduced.  • 


t  A  parenthesis  concerning  our  notation.  While  X  may  have  several  attributes,  say 
X=\X,.X^.  ■  ■  ■  .Xil  YTe  compare  X-values  directly  with  null.  For  a  tuple  t,  t  [X  )=null 
implies  that  one  of  the  X^  values  is  null.  Simil2irly,  t  [X)^^null  implies  that  no  X^  attri¬ 
bute  value  is  null. 
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Proposition  4.2 

The  NS-rules  can  only  be  applied  a  finite'  number  of  times  on  a  relation  in¬ 
stance  r.  To  apply  edl  rules  takes  time  polynomial  on  the  size  of  the  relation  in¬ 
stance. 

Proof 

(a)  Finiteness 

Initially,  r  has  a  finite  number  of  constant  values  and  a  finite  number  of  null 
values  with  each  null  participating  in  a  distinct  equivalence  class.  The  applica¬ 
tion  of  a  rule  never  introduces  a  new  constant  value.  Also,  it  may  reduce  but 
will  never  increase  the  number  of  equivalence  classes  (i.e.  when  a  NEC  is  intro¬ 
duced).  In  the  sequence  of  instances  r'  produced  after  an  NS-rule  application, 
ail  elements  are  distinct.  This  suffices  to  show  that  the  rules  are  a  finite  system. 

(b)  Complexity  Analysis 

The  NS-rules  are  applied  in  several  passes.  In  each  pass,  all  NS-rules  are  applied 
for  as  mciny  tuples  as  possible.  In  applying  the  NS-rule  X-*Y  the  instance  r  may 
first  be  sorted  in  time  0{x-n‘logn)  where  x  is  the  number  of  attributes  in  and 
n  is  the  number  of  tuples  in  r.  When  sorting,  null  values  have  the  lowest  pre¬ 
cedence  and  eire  always  distinct  unless  they  belong  to  the  same  equivalence 
class.  In  this  last  case  they  appear  together  in  the  sorted  relation.  To  apply 
in  a  pass  requires  the  equation  of  Y-values  in  possibly  more  than  one  tuple 
(same  equivalence  class).  Hence,  a  pass  over  the  Y-values  must  be  made  for 
each  change.  This  takes  time  0{n^).  Since  all  rules  are  applied  in  each  pass, 
the  time  required  for  a  pass  is  0  {\F  \-{n^+X'n'logn))  or,  0  {\F  \  'n^)  for  x  sub¬ 
stantially  smaller  than  n.  When  we  start  in  an  instance  with  p  attributes  we  have 
at  most  n  p  distinct  symbols  (constant  vedues  and  nulls).  Every  pass  reduces 
the  number  of  distinct  symbols,  hence  we  have  at  most  n  p  passes.  Therefore, 
no  rule  can  be  applied  after  0 (  lA’  1  ■n'^  p)  time,  f  ■ 

t  According  to  a  recent  result  by  [Downey  et  ad  80]  the  time  complexity  of  the  test  is 
0{\F\  •7l’log(  I  A"  I ’Tl)).  To  apply  the  NS-rules  can  be  treated  as  computing  the 
congruence  closure  of  a  directed  graph.  There  is  a  heavy  penalty  on  space  requirements  for 
the  "fast"  congruence  closiire  algorithm. 
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Given  a  relation  R  and  a  set  of  FDs  F  embedded  in  R,  we  say  that  an 
insteince  r  of  i?  is  rninirnally  —incornplete  if  no  NS-rule  can  be  applied  on  r.  The 
intuitive  meaning  of  a  minimally  incomplete  relation  state  is  that  nothing  more 
can  be  said  about  the  nulls  in  this  state.  The  importance  of  such  states  is 
demonstrated  with  the  theorems  that  follow.  The  first  two  theorems  present 
methods  for  testing  FD-satisfiability.  The  test  is  done  with  an  algorithm  (Figure 
4.7)  that  works  on  relation  instances  where  no  nulls  are  present  and  runs  in 
time  0{\F  \  ‘n’logn).  The  third  theorem  makes  the  connection  between  ”unique- 
ness"  of  a  minimally  incomplete  state  and  satisfiability  of  FDs.  This  shows  that 
the  set  of  NS-rules  is  a  finite  Church-Rosser  system. 

Theorem  4.2 

Let  be  a  relation  scheme,  F  a  set  of  FDs  embedded  in  R,  and  r  an  instance 
of  R.  Consider  the  application  of  the  algorithm  TEST— FDs  with  the  following 
convention  for  null  values: 

Convention:  Any  equality  comparison  where  a  null  is  involved  is  positive.  Also, 
any  inequality  comparison  where  a  null  is  involved  is  positive,  unless  both  values 
compared  are  null  and  they  belong  to  the  same  equivalence  class. 

F  is  strongly  —satisfied  in  r  iff  TEST-FDs(r,F)  —  yes 
Proof. 

The  convention  made  implies  that  when  testing  an  FD  on  two  tuples  ti  and 

tj  we  have  the  following.  If  one  of  is  null  then  the  comparison 

is  true.  If  the  previous  comparison  is  true  then  so  is  the  compari¬ 
son  {ti\Y^i^tj\Yf),  unless  there  exists  e.  NEC'.ti\Y\.-=tj\Y^  (agedn  we  assume  that 
at  least  one  of  fy[F]  is  null).  It  may  seem  that  with  the  above  convention 

we  have  a  problem  in  applying  the  algorithm  TEST— FDs.  f  Two  values  that  at 

t  Anolher  problem  is  sorting  the  null  values  under  the  above  convention.  Alternatively, 
another  version  of  TEST  —FDs  may  be  used,  where  the  relation  is  not  sorted  and  each  tu¬ 
ple  is  tested  against  every  other  tuple  in  the  relation.  The  running  time  is  now  0  (^\F  | 
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Figure  4.7  Testing  for  FD-satisfiability 


Oiwn  A  relaiion  n  test  a  set  of  FDe  F  in  r  for  oonsiatenoy. 

TEST^Ds  f parameters:  r,  F)  retnmsfdooieanj 
begin 

comment:  comparison  is  based  on  lexicographical  order 
oomment:  reatLjnaxi^tupl9(ry  reads  the  next  tuple  in  order 
from  r  and  returns  EOF  if  no  more  tuples 

for  each  X-*Y  in  do 
begin 

sort  relation  r  onX 
t  *-read_TLext_tuple  (r  ) 

while  tfi^i^EOF  do 
begin 

^next  *-Tead_next^tuple  (r ) 
while  tnexi  [X]^EOF  and  t^Bzi ] = t first  1  do 
begin 

if  ^next  t  ^  ^  ^ first  [  ^  ] 

then  rctnm(no) 
else  (r) 

end 

^ first  ^nexi 

end 

end 

return(^es  ) 
end 


Complexity  Analysis 

The  algorithm  runs  in  0  {\F  I'Ti’logn)  time,  where  n  is  the  size  (number  of 
tuples)  of  r  and  |i^  |  is  the  number  of  dependencies.  Each  FD  is  tested  in  time 
nlogn,  the  time  to  sort  the  relation. 

Additional  Assumptions.  If  bucket  sort  is  used,  sorting  takes  time  0  (n'p)  where 
p  is  the  number  of  attributes  in  X  for  a  dependency  Furthermore,  if  there 

is  only  one  dependency  (e.g.  BCNF  with  one  key),  and  the  relation  is  already 
sorted,  the  test  requires  time  linear  on  the  relation  size. 
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some  point  were  compared  and  found  equal  may  at  another  point  be  found  not 
equal.  For  F  not  to  be  strongly  satisfied,  it  suffices  to  find  a  completion  of  r 
where  the  FDs  are  violated.  In  comparisons  we  consider  all  possible  comple¬ 
tions.  Since  Armstrong’s  rules  are  sound  and  complete  we  can  test  FDs  for 
strong  satisfiability  independently.  But  notice  that  for  the  same  dependency 
X-*Y  the  same  attribute  values  are  never  compared  both  for  equality  and  for 
inequality  when  TEST  -FDs  is  applied. 

if-part 

The  assumption  here  is  that  TEST -FDs  (r.F)  =  yes.  We  show  that  F  is  strongly 
satisfied  in  r.  Assume  that  there  is  an  FD  X^Y which  is  violated  in  r.  That  is,  for 
two  tuples  ti,  tj  in  r,  there  exist  completions  t'i,  t'j  such  that  j\X^  and 

t'i\Y^^V j[Y\  Trivially,  when  TEST  —FDs  is  applied  on  r,  the  comparisons 
between  the  non-completed  X  and  Y  values  of  ti  and  tj  are  both  positive,  thus 
TEST —FDs {r,F)  =  710.  A  contradiction. 

only -if  part 

Suppose  that  F  is  strongly  satisfied  in  r  and  that  TEST  —FDs  returns  with  a  no 
when  applied  on  r  with  our  convention  for  nulls.  The  contradiction  is  clear.  Any 
time  the  equality  comparison  for  the  X-values  of  two  tuples  and  the  inequality 
comparison  for  the  Y-values  of  the  same  tuples  are  found  positive,  a  completion 
of  r  is  illustrated  where  the  FD  is  violated.  ■ 

Theorem  4.3 

Let  R  he  a  relation  scheme,  F  a  set  of  FDs  embedded  in  R,  and  r  a 
Tninirnalbj  -incoTTLplete  instance  of  R.  Consider  the  application  of  the  algorithm 
TEST  -FDs  with  the  following  convention  for  null  values; 

Convention:  Any  inequality  comparison  where  a  null  is  involved  is  negative.  Also, 
any  equality  comparison  where  a  null  is  involved  is  negative,  unless  both  values 


-  86  - 


compared  are  null  and  they  belong  to  the  same  equivalence  class. 

F  is  weakly  —satis fied  in  r  iff  TEST-FDs(r,F)  —  yes 
Proof. 

As  in  the  previous  theorem,  we  consider  the  problem  that  may  be  introduced 
with  our  convention  for  null  values.  It  may  be  that  the  same  two  values  are  com¬ 
pared  and  found  equal  at  some  point  and  at  another  point  not  equal.  We  show 
that  when  TEST -FDs  is  applied  in  a  minimally  incomplete  state  of  r  this  never 
happens.  Consider  two  tuples  and  tj  in  r.  Suppose  that  is  nuU  and  tj\X^ 
is  not.  If  X  appears  only  on  the  left  of  dependencies,  we  have  no  problem 
{TEST -FDs  makes  only  equality  comparisons  on  X  values).  Similcirly  for  X 
appearing  only  on  the  right  of  dependencies.  Consider  the  case  where  X  appears 
on  the  left  of  a  dependency  A' and  on  the  right  of  a  dependency  Z^X.  In  test¬ 
ing  X^Y  we  assume  that  In  testing  Z ->X,  we  only  consider  X-values 

when  ti[Z]=tj[Z].  If  the  state  is  minimally  incomplete,  we  wouldn’t  have  ti[X~\ 
null  (by  application  of  the  NS-rule).  The  case  of  both  ii[Ar]  and  tj\X^  being  null  is 
treated  with  similar  arguments.  In  this  case  the  outcome  of  the  evaluation 
depends  on  whether  or  not  the  nulls  belong  to  the  same  equivalence  class.  We 
also  note  that  the  convention  allows  for  sorting.  Null  values  are  considered  dis¬ 
tinct  and  their  order  is  not  important.  (They  are  never  equated  unless  they  are 
in  the  same  equivalence  class  in  which  case  they  appear  together.) 

if -'part 

The  substitution  of  nulls  with  different  values  from  the  one’s  appearing  in  r  illus¬ 
trates  a  completion  of  r  where  all  FDs  are  satisfied. 

only-if-part 

We  show  that  if  there  is  a  completion  of  r  where  the  FDs  are  satisfied,  then 
TEST  —FDs  {r,F)  —  yes.  Suppose  r'  is  such  a  completion.  If  TEST -FDs  has  a  no 
answer  there  must  exist  two  tuples  ti  and  tj  in  r  such  that  for  a  functional 
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dependency  A!' the  comparisons  (ii[A']=ij[A!’])  and  are  both  posi¬ 

tive.  The  first  comparison  is  positive  under  our  convention  when  both  ti\X^  and 
tj[X^  are  equal  constants,  or  both  are  nulls  in  the  same  equivalence  class.  In 
this  last  case,  they  both  have  the  same  completions  in  r’  (as  in  any  other  com¬ 
pletion  of  r).  Similarly,  for  the  second  comparison  to  be  positive  it  must  be  that 
both  fi[r]  and  tj\Y^  are  distinct  constant  values.  It  follows  immediately  from 
the  above  arguments  that  the  FD  X^Y  is  violated  for  the  two  tuple  completions 
in  r'.  A  contradiction.  ■ 

Note  that  the  test  for  strong  satisfiability  is  less  expensive  than  the  one  for 
weak  satisfiability  since  it  does  not  require  a  minimally  incomplete  instance. 
This  comes  as  no  surprise  -  intuitively,  very  few  relation  instances  would  be 
strongly-consistent. 

The  NS-rules  applied  in  a  different  order  may  result  in  different  minimally 
incomplete  states.  This  is  illustrated  with  an  example.  Consider  a  relation  R 
with  three  attributes,  the  dependencies  A  C  -*B,  and  the  instance  r  (figure 
4,8).  Applying  the  rule  A  -^B  first,  we  get  a  minimally  incomplete  state  r’.  On  the 
other  hand  if  we  first  apply  C  ->B  we  get  a  different  minimally  incomplete  state 
r". 

r’ 

ABC 

tti  6 1  Cj 

^2  ^2  ^2 

tti  bi  Cz 

Figure  4.0 

From  definition  4.S,  an  NS-rule  for  an  FD  is  applied  if  there  exist 

tuples  t,  'LL  such  that  i[Ar]=u[A']  and  one  or  both  of  t\Y\  u[F]  are  null.  We  now 
extend  the  notion  of  an  NS-rule  application  and  we  assume  that  a  rule  may  be 
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applied  even  if  none  of  t\Y\  u\Y^  is  null,  but  as  constants  they  are  distinct.  In 
this  case  they  are  both  replaced  by  the  inconsistent  element  (the  nothing  data 
value).  This  triggers  the  replacement  with  nothing  of  all  constants  thSkt  are 
equal  to  them.  In  our  example,  ilA-^B  is  applied  first  producing  r',  then  C 
can  be  applied  on  r'  resulting  in  an  instance  with  all  values  in  the  B  column  equal 
to  nothing.  It  is  easily  observed  that  the  application  of  the  rules  in  reverse 
order  will  produce  the  same  instance.  The  theorem  below  is  proven  in  [Graham 
80] 

Theorem  4.4  [Graham  80] 

Given  a  relation  scheme  R,  a  set  of  FDs  F,  and  an  instance  r  of  R.  Then, 

(a)  The  application  of  the  NS-rules  will  produce  a  unique  minimally  incomplete 
instance  (i.e.  the  NS-rules  constitute  a  Church-Rosser  system). 

(b)  F  is  weakly-satisfied  in  r  iff  the  nothing  value  does  not  appear  in  the  result¬ 
ing  minimally  incomplete  instance,  ■ 

For  the  proof  of  the  theorem  the  notion  of  congruence  closure  is  used  [Dow¬ 
ney  et  ai  80].  However,,  instead  of  constructing  the  congruence  closure  of  the 
graph  for  r,  one  starts  from  the  congruence  closure  and  represents  each 
equivalence  class  as  follows.  An  equivalence  class  is  represented  by  its  smallest 
element,  f  If  an  equivalence  contains  the  nothing  symbol,  this  is  considered  as 
the  smallest  element.  A  row  of  a  relation  instance  is  constructed  by  collecting 
the  descendents  and  equivalence  class  of  each  of  the  minimum  elements.  The 
result  is  a  relation  instance  that  possibly  contains  nothing  values.  This  instance 
is  unique  and  is  exactly  the  minimally  incomplete  instance  produced  by  applica¬ 
tion  of  the  NS-rules. 


t  Order  here  is  defined  as  follows.  Constants  are  always  smaller  than  nulls  eind  incomparable 
if  they  are  unequail  constcints.  Niills  are  indexed  (in  order  of  appeairance  in  a  row)  emd 
iffi  <j. 
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Theorem  4.4  verifies  that,  in  any  minimally  incomplete  instance,  the  test 
for  satisfiability  will  determine  correctly  whether  the  FDs  are  satisfied. 

4.6.  Summary  of  Results 

Functional  dependencies  were  examined  in  the  light  of  incomplete  informa¬ 
tion  in  a  database.  Our  results  are  both  encouraging  and,  in  some  respect, 
discouraging  concerning  the  possibility  of  allowing  nulls  with  no  restrictions  in 
relations.  Two  notions  of  FD  satisfiability  were  introduced.  The  first  is  the  regu¬ 
lar  one  which  requires  that  an  FD  takes  the  truth-value  true  when  it  is  inter¬ 
preted  as  a  predicate  on  relation  instances.  In  addition,  a  weak  notion  of 
satisfiability  was  defined  which  allows  for  uncertainty  on  the  validity  of  an  FD  as 
long  as  this  uncertainty  does  not  introduce  contradictions.  It  was  shown  that  a 
null  value  does  not  have  an  impact  on  the  validity  of  an  FD  if  it  appears  in  spe¬ 
cial  places.  This  is  because  there  exists  a  substitution  of  this  null  (possibly  all 
substitutions)  which  results  in  having  the  dependency  satisfied.  Furthermore,  to 

I 

find  these  cases  of  satisfiability  is  not  computationally  hard. 

On  the  other  hand,  there  are  some  extreme  cases  where  all  substitutions  of 
the  null  result  in  inconsistent  states.  This  occurs  with  the  weak  notion  of 
satisfiability  and  domain-size  restrictions.  The  test  to  find  such  cases  is  domain 
and  state-dependent,  thus  having  an  unacceptable  complexity  for  practical  con¬ 
siderations.  It  was  argued  that  in  practice  the  above  extreme  cases  are  unlikely 
to  appear,  provided  that  the  dependencies  are  carefully  defined  (e.g.  on  attri¬ 
butes  with  large  domains). 

Weak  satisfiability  seems  to  be  the  more  important  and  interesting  notion 
from  a  practical  viewpoint.  If  a  database  were  actually  to  reflect  the  real  world 
situation,  it  would  be  "overconstrained”.  That  is,  there  is  a  large  number  of 
semantic  constraints  which  would  make  sense  for  a  database.  However,  data- 
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base  systems  do  not  usually  have  the  ability  to  maintain  all  these  constraints. 
The  test  of  constraint  validity  in  a  database  instance,  apart  from  being  prohibi¬ 
tively  expensive,  results  mainly  in  verifying  that  most  of  the  data  is  "dirty".  On 
the  other  hand,  null  values  and  weak  satisfiability  allow  constraints  to  be  valid  in 
more  instances. 

A  basic  result  of  this  chapter  is  the  verification  for  extended  FDs  of  the 
soundness  and  completeness  of  the  same  inference  rules  that  were  sound  and 
complete  for  FDs  with  no  nulls.  This  allows  us  to  conclude  that  all  work  on  nor¬ 
malization,  decomposition,  etc.  where  FDs  are  involved  can  be  applied  directly 
in  our  framework  of  incomplete  information. 

As  was  mentioned  in  chapter  1,  the  universal  relation  assumption  in  rela¬ 
tional  database  design  is  questioned  both  on  practical  and  theoretical  grounds. 
We  have  provided  a  partial  reply  to  the  practical  attacks  on  the  possibility  of  a 
universal  relation  instance.  More  realistic  instances  may  now  be  perceived,  the 
ones  where  nulls  are  allowed.  In  [Bernstein  and  Goodman  80]  it  is  shown  that 
the  requirement  of  having  the  universal  relation  assumption  defeats  the  purpose 
of  normalization,  which  is  to  avoid  update  anomalies.  In  the  next  chapter  we 
deal  with  this  shortcoming. 


Chapter  5 


A  Weaker  Form  of  the  Universal  Relation  Assumption 


Nothing  begins  and  nothing  ends 
-  Francis  Thompson 


5.1.  Introductory  Concepts 

Selecting  a  ''good"  set  of  relation  schemes  to  model  a  part  of  the  real  world 
is  the  process  of  relational  database  design.  The  choice  is  not  an  easy  one  since 
it  involves  the  consideration  of  many  factors,  sometimes  with  conflicting  goals. 
Among  the  important  factors  taken  into  consideration  in  relational  database 
design  is  the  intended  usage  of  the  schema,  that  is,  the  type  of  the  most  fre¬ 
quent  queries,  the  type  of  the  most  frequent  modifications  and  the  nature  of  the 
application  which  is  modelled  (i.e.  modification  or  query-oriented). 

Currently,  there  is  a  conceptual  understanding  of  the  factors  involved  in 
relational  database  design.  However,  their  interaction  is  not  well  enough  under¬ 
stood  so  that,  given  all  the  application  requirements,  we  could  design  mechani¬ 
cally  the  most  appropriate  database.  Still,  there  are  some  factors  in  database 
design  that  are  well  understood  and  can  be  treated  formally.  Redundancy, 
potential  inconsistencies  and  modification  anomalies  are  among  them.  The 
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notion  of  functioned  dependency  has  been  the  centred  concept  for  their  formedi- 
zation  and  the  formulation  of  rules  that  guide  us  in  the  selection  of  a  "good"  set 
of  relation  schemes  for  the  database. 

As  ein  illustration,  consider  the  relation  scheme  [Codd  72]: 

COUP  ANY  {employ  ee§,  depariment§,  manager^jf,  conLract-iype) 

There  are  several  problems  that  could  potentially  affect  an  instance  of  this  rela¬ 
tion  scheme. 

(a)  Update  anomalies  The  change  of  manager  in  a  department  requires  the 
change  of  an  unpredictable  number  of  tuples  in  a  COMPANY  instance  since 
the  manager#  is  repeated  for  each  employee  in  the  department  he 
manages.  Inconsistencies  may  be  created  if  the  above  process  is  not  per¬ 
formed  correctly. 

(b)  Insertion  anomalies  When  the  first  employee  is  hired  for  a  department,  we 
must  also  insert  a  m.anager  for  the  department  and  a  contract  type.  This 
may  not  be  a  reasonable  restriction  for  the  intended  application. 

(c)  Deletion  anomalies  This  is  the  inverse  of  the  insertion  anomaly.  When  the 
last  employee  in  a  department  is  fired,  we  also  loose  the  information  about 
the  manager  and  the  contract  type  of  the  department. 

(d)  Redundancies  The  contract  type  of  the  department  and  the  manager  are 
repeated  for  each  employ^ee. 

The  above  problems  go  away  if  we  use  instead  the  two  relation  schemes 
[Codd  72].  t 

EMPLOYEE(cmpJoyeej^,  department#) 

DEPARTMENT(deparf77icnfj^,  manager#,  contract-type) 


■(■  On  the  other  hand,  this  set  of  relation  schemes  may  create  performance  problems  if  the 
query  "List  all  the  employees  managed  by  X"  is  asked  frequently.  To  answer  the  query  we 
now  require  a  join  instead  of  the  less  expensive  selection  and  projection. 
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The  choice  of  this  set  of  schemes  is  not  arbitrary.  It  is  based  on  the  reason¬ 
able  semantic  rules  "An  employee  works  in  only  one  department  and  a  depart¬ 
ment  has  only  one  manager  and  one  contract  type".  These  rules  are  formalized 
syntactically  as  the  functional  dependencies  employee^  department#, 
department#  ->  manager#,  contract  —type.  A  design  theory  has  been  developed 
around  the  concept  of  functional  dependencies  and  the  basic  results  are  now 
reviewed. 

Let  A’  be  a  set  of  functional  dependencies  (FDs)  for  relation  scheme  R  and 

be  a  functional  dependency.  We  say  that  F  implies  X^Y  if  there  is  no 

counter  example  relation  instance  r  ol  R  that  satisfies  F  but  not  X-^Y.  We 

denote  byA”*"  the  closure  of  A’,  i.e.  the  set  of  dependencies  that  are  implied  by  A’. 

We  say  that  a  set  of  attributes  A"  is  a  key  of  R  if  X-^R  is  in  F"^  and  for  no  proper 
0 

subset  Y  of  X,  Y ^R  is  in  A"^.  Attributes  in  keys  are  called  prime.  Armstrong’s 
inference  rules  for  FDs  (figure  4.4)  are  sound  and  complete.  The  closure  (with 
respect  to  F)  of  a  set  X  of  attributes,  denoted  by  X^,  is  defined  as  the  set  of 
attributes  A,  such  that  X-^A  can  be  deduced  from  F  by  Armstrong’s  axioms. 
Computing  A”*"  is  very  expensive,  simply  because  the  set  of  dependencies  in  A"*” 
can  be  large.  On  the  other  hand,  computing  X^  is  not  hard;  it  takes  time  pro¬ 
portional  to  the  length  of  the  dependencies  in  F  [Ullman  80]. 

Two  sets  of  dependencies  F  and  C  are  said  to  cover  each  other  if  F'^=G'^. 
Among  these  "equivalent"  sets  of  FDs  some  are  minimal  in  that  they  have  no 
redundeuit  dependencies  and  also  in  that  there  are  no  redundant  attributes  in 
both  left  and  right  hand  sides  of  the  FDs  in  the  set.  A  minimal  cover  can  always 
be  found  for  a  set  of  FDs  (in  polynomial  time).  Database  design  algorithms  are 
based  on  minimal  covers  [Beeri  et  al  78]. 

The  decomposition  of  a  relation  scheme  R  is  its  replacement  by  a  set  of 

k 

relation  schemes  p=(A?i,/?2.  '  ’  ‘  i^k)  such  that  Decompositions  are 

i  =  l 
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considered  good  when  they  are  equivalent  to  R.  Intuitively,  p  is  equivalent  to  R  if 
they  both  represent  the  same  external  application.  In  more  formal  terms,  their 
consistent  states  (instances)  are  the  same.  The  strongest  equivalence  definition 
[Beeri  et  al  78]  requires  two  properties  fori?,  both  based  on  the  concept  of  FDs. 

El.-  Lossless-join  property; 

E2.-  dependency  preservation. 

We  say  that  p  has  the  lossless  —  join  property  (with  respect  to  F)  if  for  every 
relation  instance  r  of  i?  satisfying  F‘. 

k 

T  -  (r) 

1=1  ' 

where  "tt"  and  denote  the  projection  and  join  relational  operators  respec¬ 
tively. 

Intuitively,  the  above  property  states  that  the  consistent  instances  of  R  and 
p  are  isomorphic,  with  the  join  and  the  projection  the  respective  isomorphisms. 
If  p  has  the  lossless-join  property  it  is  gueiranteed  that  any  relation  can  be 
recovered  from  its  projections.  Testing  for  the  lossless-join  property  is  compu¬ 
tationally  easy  [Ullman  80]. 

The  second  desired  property  of  decompositions  is  that  dependencies  should 
be  preserved.  This  is  desirable  since  dependencies  are  integrity  constraints  on 
the  database.  Formally,  we  say  that  the  projection  of  F’  on  a  set  of  attributes  R^ 
is  the  set  of  FDs  X-^Y  such  that  XY QR^  and  X^Y We  denote  this  projection 

i 

by  Fi.  Decomposition  p  preserves  F’  if  (  \jFi)'^  =  F'^ .  Schemas  p  that  preserve  a 

i=  1 

set  of  FDs  F  arc  sometimes  called  in  the  literature  as  embedding  a  cover  of  F. 
The  test  for  this  property  can  also  be  made  in  polynomial  time. 

The  most  significant  decompositions  are  3NF  (third  normal  form)  and  BCNF 
(Boyce-Codd  normal  form).  A  decomposition  is  in  3NF  if  none  of  its  nonprime 
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attributes  is  transitively  dependent  on  any  of  its  keys,  y  A  decomposition  p  is  in 
BCNF  if  V-  X,YQR,  when  X-^Y  then  X  contains  a  key  of  R.  Unfortunately,  BCNF 
decompositions  can  not  always  be  made  to  have  the  two  desired  properties  of 
equivalence  [Beeri  and  Bernstein  79],  Furthermore,  it  is  computationally  very 
hard  (NP-hard)  to  determine  whether  a  given  decomposition  is  BCNF.  Decompo¬ 
sitions  which  are  3NF  are  "weaker"  than  BCNF  (they  are  implied  by  BCNF)  but 
can  always  be  made  to  have  properties  El  and  E2  [Biskup  et  al  79].  The  motiva¬ 
tion  behind  these  decompositions  is,  that  as  database  schemes,  they  solve  the 
anomaly  and  redundancy  problems  which  R  has  as  discussed  at  the  start  of  this 
section.  In  particular,  our  example  decomposition  of  COMPANY  to  the  two  rela¬ 
tion  schemes'  EMPLOYEE  and  DEPARTMENT  is  BCNF  and  has  both  equivalence 
properties  (lossless-join  and  dependency  preservation). 

5.2.  The  Universal  Relation  Assumption  and  Database  Consistency 

In  the  previous  section  we  considered  decomposition  as  the  major  step  in 
relational  database  design.  Hence,  we  implicitly  assumed  that  the  design  starts 
from  a  relation  scheme  which  has  all  the  attributes  of  the  universe,  the  Univer¬ 
sal  Relation.  The  goal  of  a  good  design  is  to  decompose  the  universal  relation  to 
a  set  of  relation  schemes  which  is  in  normal  form  (e.g.  BCNF)  and  satisfies  the 
two  equivalence  properties  El  and  E2.  It  turns  out  that  if  we  want  the  decompo¬ 
sition  to  have  these  properties  we  must  make  a  very  restrictive  assumption 
about  decomposition  instances,  the  Universal  Relation  Instance  assumption,  or 
simply,  UR-assumption. 

Definition  ^.l  Universal  Relation  Assumption 

k 

Let  R  be  a  universal  relation  and  p  =  {R  '  '  ‘  Mk)’  where  R  =  IjRi,  be  a 

i  =  l 


t  An  attribute  A  is  transitively  dependent  on  X,  if  there  exists  Y QR  such  that  A  is  not  in  Y, 
X^Y,  Y-^Asxc\nF^.hutY/X. 
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decomposition  of  R.  For  any  instance  •  •  •  .r^)  of  p  there  exists  an 

instamce  r  of  R  such  that  7T^^(r)  =  V  -1  =  1,  2,  '  •  ■  .k.  ■ 

This  assumption  may  be  considered  as  a  constraint  on  decomposition 
instances.  If  a  decomposition  instance  satisfies  this  constraint  it  is  called  join- 
consistent  or  UR- consistent,  t 

An  obvious  way  to  test  for  join-consistency  is  to  form  the  natural  join  of  all 
Ti's  and  then  project  on  the  attributes  of  each  to  verify  whether  Ti  has  been 
recovered.  This  test  will  succeed  since  it  can  be  easily  proven  that  if  the  data¬ 
base  is  join-consistent,  then  the  natural  join  is  a  UR  instance  corresponding  to 
the  database  in  the  exact  way  defined  above.  Clearly,  this  test  has  an  unnac- 
ceptable  complexity  for  practical  considerations  (exponential  on  the  size  of  the 
database).  Furthermore,  very  little  hope  exists  for  a  substantially  better  test, 
since  the  problem  is  kno'wn  to  be  NP-complete  [Honeyman  et  al  80]. 

When  designing  a  relational  database,  we  need  to  be  able  to  talk  about  the 
dependencies  used  in  the  decomposition  process  in  a  consistent  way.  To  do  this 
we  make  the  UR-assumption.  It  allows  us  to  talk  about  dependencies  in  a  com¬ 
mon  framework  -  the  relation  with  all  the  attributes  (universal  relation)  -  which 
allows  for  their  consistent  definition  and  further  manipulation.  This  guarantees: 

1.  Uniqueness  of  FDs 

Syntactically  identical  FDs  are  also  semantically  equiveilent. 

2.  Completeness  of  inference  rules  for  FDs 

The  inferred  FDs  are  precisely  the  implied  FDs. 

Although  these  properties  hold  in  a  single  relation,  they  will  not  always  hold  in  a 

decomposition  which  is  not  based  on  the  the  universal  relation  assumption.  This 

t  This  was  not  originally  the  intention  for  the  UR-assumption.  The  UR-assumption  was  con¬ 
sidered  as  a  one-time  invocation  to  ease  the  database  design  difficulties.  It  turns  out  though, 
that  the  assumption  must  remain  eis  a  constraint  when  modifications  of  the  database  togeth¬ 
er  ■Rdth  consistency  requirements  (with  respect  to  FDs)  are  considered. 
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of  course  has  serious  efTects  on  our  definition  of  equivalence.  The  FDs  in  the 
decomposition  will  be  different  semantically  from  the  syntactically  identical  FDs 
in  R. 

By  definition  of  schema  equivalence,  p  and/2  must  have  the  same  consistent 
instances.  Consistency  of  R  instances  is  straightforward;  all  dependencies  in  F 
must  be  satisfied.  Following  this  line  of  reasoning,  it  would  seem  natural  to  infer 
that  an  instance  of  p  is  consistent  (with  respect  to  F)  if  every  relation  instance 
Ti  is  consistent  (with  respect  to  Fi).  However,  this  way  of  defining  consistency  is 
not  "correct"  in  the  framework  of  the  universal  relation.  The  following  example 
illustrates  this  fact,  t 

Let  p  =  (  <Ri,Fi>,  <R2,F2>,  <R3,F3>  )  be  a  database  scheme  where: 
Ri  =  {A.B],  R2=  {B.Cl  Rs  =  \A.C],  and  F^  =  [A^B],  Fg  =  [B^C],  Fg  =  {A-^C]. 
Consider  the  database  instance: 

’’2 

B C 

1  1 
2  2 

Figure  5.1 

According  to  our  definition,  this  database  is  consistent  since  all  three 
dependencies  are  satisfied  on  the  relations  on  which  they  are  defined.  But 
notice  that  the  C-value  associated  with  the  A-value  directly  in  rg  is  not  the  same 
C-value  we  get  when  we  go  from  A,  indirectly  via  B,  to  C.  In  other  words,  the 
dependency  A  ->  C  is  not  semantically  equivalent  with  the  syntactically  identi¬ 
cal  dependency  that  we;  derive  when  we  apply  transitivity  to  the  dependencies 

t  Since  we  only  consider  decompositions  that  preserve  dependencies,  we  denote  a  relation 
scheme  in  a  decomposition  as  a  pair  <Ri,  Fi> ,  v/here  F^  are  the  projected  dependencies  of 
F^  onFi. 


A 

B 

1 

1 

2 

2 

-  98- 


A  -*  B,  B  -*  C. 

Another  way  to  define  what  we  mean  by  consistency  of  a  database  is  to  con¬ 
sider  the  corresponding  (according  to  the  UR-assumption)  instance  of  the 
universal  relation,  and  clcdm  that  the  database  is  consistent  if  the  dependencies 
are  satisfied  on  this  instance.  This  of  course  assumes  that  there  exists  such  a 
UR-instance.  Hence,  given  a  database  instance  and  a  set  of  dependencies,  to 
answer  the  question;  "Is  the  database  consistent?"  requires  first  an  answer  to 
the  question:  "Is  the  database  join-consistent?".  It  is  noteworthy  that  our  exam¬ 
ple  database  is  not  join-consistent. 

Since  the  consistency  of  the  database  is  tested  on  a  corresponding  univer¬ 
sal  relation  instance,  the  definition  of  schema  equivalence  is  now  trivially 
correct.  Uniqueness  of  FDs  and  completeness  of  their  inference  rules  is 
guaranteed.  Thus,  the  invocation  of  the  UR-assumption  circumvents  the  prob¬ 
lems  in  defining  equivalence. 

5.3.  A  Critique  of  the  UR-assumption. 

In  this  section  we  criticize  the  UR-assumption  both  on  pragmatic  and  on 
semantic  grounds.  We  present  two  arguments  against  the  UR-assumption: 

1. -  It  is  too  hard  to  maintain  for  consistency; 

2. -  it  defeats  the  purpose  of  decomposition. 

We  now  elaborate  on  the  above  two  points.  Considering  the  UR-assumption 
as  an  additional  constraint  on  database  instances  we  argue  that  it  is  not  a  "rea¬ 
sonable"  one.  In  the  present  context,  by  "reasonable"  we  mean  that  it  could 
never  be  maintained  in  a  real  database.  As  proven  in  [Koneyman  et  al  80],  test¬ 
ing  for  join-consistency  is  an  NP-complete  problem.  An  even  stronger  result  in 
[Honeyman  et  al  80]  verifies  that  even  when  the  database  is  join-consistent,  to 
test  whether  it  remains  join-consistent  after  a  simple  change  (e.g.  insertion  of  a 


-  99  - 


tuple)  is  again  an  NP-complete  problem.  Since  join-consistency  is  a  prerequisite 
of  database  consistency,  we  can  never  have  databases  in  practice  where  con¬ 
sistency  is  tested  in  a  reasonable  amount  of  time. 

Apart  from  the  computational  complexity  problems  that  the  UR-assumption 
introduces,  it  also  creates  serious  semantic  problems.  In  particular,  it  defeats 
the  purpose  of  normalizations/decompositions.  It  has  been  shown  that  normal¬ 
ized  databases  which  are  join-consistent  still  have  modification  anomalies  [Bern¬ 
stein  and  Goodman  80].  This  gap  between  static  properties  of  databases  (nor¬ 
malization)  and  their  dynamic  nature  (changes  of  values)  under  the  UR- 
assumption  is  summarized  in  the  following  basic  result. 

Proposition  5.1  [Bernstein  and  Goodman  80] 

Under  the  UR-assumption,  a  multi-relation  database  schema 

p  —  (<Rj,F|>,  ‘  with  <R,F>  the  universal  relation,  where 

k  k 

R  =  [jRi  and  F*  =  (  is  free  of  modification  anomalies  ifi 

1=1  i=l 

(a)  pisinBCNF; 

(b)  V  Ri.Rj  if  Rif^Rj^tp  then  the  FDs  Ri-^Rj  and  Rj-^R^  are  in  F*.  ■ 

The  above  conditions  on  database  schemas  are  quite  restrictive.  For 
instance,  the  example  in  section  5.1  will  have  modification  anomalies.  Recall 
that  we  had  two  relations  EMPLOYEE  and  DEPARTMENT  with  the  dependencies 
employee# departmental^  and  department#  ^manager#,  contract —type.  Both 
relations  are  in  BCNF  but  condition  (b)  in  proposition  5.1  is  violated. 

In  order  to  arrive  to  the  result  of  Proposition  5.1,  [Bernstein  and  Goodman 
80]  show  that  any  attempt  to  preserve  join-consistency  by  artificially  supplying 
(deleting)  values  for  the  database  whenever  an  insertion  (deletion)  is  performed 
in  a  single  relation  has  unpredictable  effects.  We  may  also  reach  the  same  con¬ 
clusion  in  a  more  easily  acceptable  and  comprehensive  way.  The  UR-assumption 
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is  not  an  intuitively  simple  concept.  Instead  of  criticizing  the  UR-assumption 
directly,  we  criticize  one  of  its  consequences.  All  join-consistent  databases  have 
the  common  intersection  property  (CIP)  which  states  that  in  a  join-consistent 
database,  every  pair  of  relations  agree  on  their  common  attribute  values.  For¬ 
mally,  for  every  r^^.rj  in  the  database,  such  that  X  =  ,  then 

=  7Tx(rj).  Simply,  CIP  implies  that  if  a  value  for  a  common  attribute 
appears  in  a  relation,  it  should  also  appear  in  all  other  relations  where  this  attri¬ 
bute  is  common.  In  a  database  where  CIP  is  not  preserved,  tuples  which  do  not 
join  with  einy  other  tuple  ("dangling''  tuples)  may  appear. 

Consider  again  the  example  from  [Codd  72]  which  we  presented  in  section 
5.1  and  the  motivation  behind  decompositions.  We  decompose  because  we  want 
to  provide  the  ability  to  keep  "dangling  tuples"  (e.g.  the  second  tuple  in  r^)  in 
the  database,  t 


7*1  7*2 


CT 

7 

10 

X 

2 

20 

y 

11 

1 

Figure  5.2 

In  a  decomposed  schema  we  come  closer  to  dealing  with  "independent 
facts".  Information  about  employees  and  information  about  departments  should 
be  separated,  thus  providing  the  ability  to  deal  with  each  individually.  Having 
the  insteince  in  Figure  5.2  should  be  considered  as  an  advantage  of  decomposi¬ 
tion.  The  "dangling  tuple"  in  describes  a  department  which  does  not  neces¬ 
sarily  have  employees.  But  join-consistency  requirements  object  to  such  an 
instance!  CIP  does  not  hold,  therefore  the  database  is  7ioi  join-consistent. 


t  For  economy  of  space  we  give  a  shorthand  notation  for  the  attribute  names. 
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We  now  show  that  attempts  to  preserve  "artificially"  CIP  are  bound  to  fail. 
"Artificial"  preservation  implies  that  wc  should  insert  a  tuple  (something,  2)  in 
rg.  This  "artificial"  preservation  of  CIP  defeats  one  of  the  goals  of  decomposition 
-  we  introduce  unnecessary  redundancy.  Preserving  CIP  means  that  when  we 
delete  the  last  employee  in  the  department  we  also  loose  all  the  information 
about  the  department.  The  type  of  "modification  anomaly"  we  are  trying  to 
avoid  with  the  decomposition!  In  addition,  how  do  we  choose  these  "some¬ 
things"?  [Bernstein  and  Goodman  80],  in  criticizing  the  UR-assumption,  chose  to 
use  regular  values  that  are  already  in  the  database.  Of  course  this  is  unrealis¬ 
tic.  The  database  makes  commitments  on  connections  that  the  user  did  not 
specify,  even  worse,  it  makes  them  in  an  arbitrary  way.  Even  when  the  uncom- 
mited  value  is  used  (jnissvng)  for  this  "something",  the  arbitrary  commitryients 
may  now  be  avoided  but  unnecessary  redundancy  is  still  introduced.  Even 
worse,  in  light  of  the  results  of  chapter  3,  there  is  also  a  heavy  penalty  in  query 
evaluation  when  nulls  are  stored  explicitly  in  the  database. 

In  the  following  sections  we  again  use  null  values  and  the  theory  developed 
for  null  values  to  tackle  and  provide  solutions  to  the  above  mentioned  problems 
introduced  from  join-consistency  requirements.  More  importantly  though,  the 
philosophy  behind  our  approach  is  that  nulls  should  not  be  introduced  (stored 
explicitly)  in  the  database.  We  use  nulls  only  for  technical  reasons  i.e.  to  test  for 
consistency. 


5.4.  Weakening  the  UR-Assumption 

W'e  are  now  ready  to  raise  the  obvious  question;  "Is  the  UR-assumption 
really  necessary  (at  least  in  this  form)  for  relational  database  design  theory?" 
The  answer  to  this  question  is  a  strong  NO.  The  reasons  for  invoking  the  UR- 
assumption  stem  from  the  encountered  problems  in  defining  database  schema 
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eqiiivalence,  and  in  particulcir,  in  defining  FD-consistency  for  a  multi-relation 
database.  The  UR-assumption  gives  us  a  framework  to  test  consistency  by 
establishing  a  correspondence  between  any  insteince  of  the  multi-relation  data¬ 
base  and  the  natural  join  of  all  relations  in  the  database.  The  join  is  then  pro¬ 
jected  on  each  relation  scheme  for  verification  of  join-consistency  (the  projec¬ 
tions  must  be  equal  to  the  database  relation  instances).  After  this  test,  the 
functional  dependencies. are  tested  for  satisfiability  on  the  join.  In  summary,  we 
need  a  universal  relation  instance  to  test  FD-satisfiability. 

We  believe  that  there  is  a  need  for  some  universal  relation  instance.  After 
all,  all  attribute  values  (present  or  not)  are  related  in  that  there  is  only  one  por¬ 
tion  of  the  reed  world  modelled.  We  also  want  to  use  this  insteince  as  a  frame¬ 
work  for  functional  dependency  satisfaction,  verifying  that  no  inconsistencies 
are  introduced  in  the  database.  However,  we  do  not  need  a  fully  determined 
instemce  as  long  as  we  know  that  there  exists  at  least  one  such  instance  where 
all  the  dependencies  hold.  Such  an  incomplete  instance  is  "controlled"  in  that, 
there  is  always  a  way  to  generate  it  from  the  current  database  state  if  all  infor¬ 
mation  was  available.  Instead  of  using  the  join  for  the  consistency  test,  we 
choose  to  use  a  different  universal  relation  instance  which  we  argue  is  a  more 
natural  choice.  In  addition, 

1. -  Testing  FD-satisfiability  on  this  instance  is  "easy"; 

2. -  the  technical  benefits  from  the  UR-assumption  invocation  are  achieved; 

3. "  the  undesirable  effects  of  the  UR-assumption  (e.g.  modification  anomalies 

even  in  normalized  schemas)  are  avoided. 
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Definition  5.2 


k 

For  each  instance  (ri,r2,  •  •  ■  ,r^)  of  p  with  universal  relation  R  =  \jRi, 

t=i 


define  an  instance  /  of  /?  as  follows: 
k 

II.-  n  =  2  rii  t 
i=l 


we 


12.-  t€.I  if  :  t[^Ri]—ti  and  <  [i?  — 

The  universal- relation  instance  I  is  called,  a  natural  representative  instance  of 


(ri.rz,  •  ■  •  .rk).  • 


Intuitively,  we  may  consider  /  as  the  universal  relation  instance  which 
results  when  every  tuple  insertion  in  the  database,  say  tuple  t^  in  r^,  is 
translated  to  an  insertion  in  the  universal  relation.  Since  the  only  values 
specified  are  for  a  subset  of  the  attribute  set  R  (in  this  case  for  Ri),  we  have  the 
missing  null  value  for  the  rest  of  the  attributes.  These  nulls  represent  an  impli¬ 
cit  imperfection  of  the  database.  There  is  no  need  to  store  these  nulls  since 
they  do  not  add  to  the  information  content  of  the  database. 

The  representative  instance  I  has  some  interesting  properties. 


Proposition  5.2 

(Pl)riC7Ti?.(/) 

(P2)  if  for  no  Ri.Rj  we  have  R^QRj,  then  V  u€.{nii^{I)-ri),  u  has  at  least  one 

(P3)  if  r  is  an  instance  of  R  :  riQ-naJj),  then  there  exists  a  completion  /',  of  /, 
such  that  r  Qrj 
Proof 

(PI)  and  (P2)  are  trivial  by  construction  of  /.  For  (P3)  it  suffices  to  show  that 
there  exists  a  substitution  of  nulls  in  /  (not  necessarily  unique),  fpr,  such  that 
^r{I)Cr.  The  existence  of  such  a  map  can  be  readily  seen.  ■ 

1 71  is  the  number  of  tuples  (size)  in  I  and  Tii  is  the  size  of  r^. 
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Property  (P3)  verifies  that  I  is  the  smallest  in  the  partial  order  of  instances 
T  of  R  with  the  property  riC7T^^(r).  We  call  all  such  instances,  covering  universal 

instEuices.  A  covering  instance  r  is  called  exact  if  there  exists  a  completion  of 
/  such  that 

The  definition  of  database  consistency  for  such  a  universal  relation  instance 
is  now  presented. 

Definition  5.3 

Let  p  be  a  database  schema  with  R  its  corresponding  universal  relation.  An 
instance  (ri.rg,  ■  •  '  of  p  is  weakly  (strongly)  consistent  (with  respect  to  a 
set  of  FDs  F)  if  the  representative  universal  relation  instance  1  is 
weakly  (strongly)  consistent.  ■ 

In  the  sequel  we  will  use  the  term  consistent  to  denote  actually 
weakly  consistent,  unless  otherwise  specified.  Consistency  of  /  with  respect  to  F 
can  be  tested,  as  was  proven  in  chapter  4,  in  time  0  (n- IF  1  •log(n- [/’ | )).  The 
NS-rules  are  applied  on  /  producing  a  "minimally  incomplete"  instance,  which  we 
represent  by  /*.  Then,  the  algorithm  in  figure  4.7  is  used  on  /*  with  the  conven¬ 
tion  of  theorem  4.3  about  nulls.  We  call  the  database  consistency  test  global  for 
the  obvious  reasons. 

Proposition  5.3 

Let  •  •  •  .r*.)  be  a  database  and  /  its  representative  universal  relation 

instance.  For  any  exact  covering  instance  r  the  following  hold: 

1. -  If  /  is  not  consistent,  then  r  is  also  not  consistent; 

2. -  if  r  is  consistent,  then  I  is  also  consistent  (contrapositive). 

Proof 

By  definition,  when  the  consistency  test  on  /  is  negative  it  means  that  there  is  no 
substitution  of  null  values  producing  an  instance  r'  from  /  where  the  dependen¬ 
cies  are  satisfied.  For  the  contrapositive,  the  fact  that  the  exact  covering  is 


consistent,  demonstrates  that  there  is  a  completion  of  /  where  the  FDs  are 
satisfied,  hence  /  is  weakly  consistent.  • 


An  example  of  an  exact  covering  is  the  outer- join  [Codd  80].  Outer-join, 
denoted  by  is  an  operation  like  the  natural  join,  with  the  only  difference 
that  if  values  in  a  tuple  of  a  relation  can  not  be  joined,  then  the  tuple  is  inserted 
in  the  result  with  nulls  for  the  other  attributes.  For  instance,  the  outer-join  of 
the  relations  in  Figure  5.2  is: 


M§ 

CT 

£■# 

1 

10 

X 

11 

2 

20 

V 

iS 

Figure  5.3 


It  is  obvious  that  outer-join  collapses  to  the  natural  join  when  all  values  can 
be  joined.  Outer-join  is  not  an  associative  operation.  If  we  outer-join  the  rela¬ 
tions  in  Figure  5.1  following  different  orders,  we  get  different  results: 


(r  i®r2)®r3 


ABC 

1  1  1 

2  2  2 

1  T?  2 

2  T?  1 


Figure  5.4 


k 

We  denote  any  outer  join  of  relations  rj.rg,  •  •  •  ,rfc  with:  0  ri 

1=1 

Proposition  5.4 

Any  outer-join  is  an  exact  covering. 


ri0(r20r3) 


A 

B 

C 

1 

2 

2 

2 

1 

1 

1 

1 

2 

2 

r20(ri0r3) 


A 

B 

C 

1 

1 

2 

2 

2 

1 

T? 

1 

1 

2 

2 

-  106- 


Proof  (trivial)  ■ 

Using  the  outer-join  we  can  illustrate  the  results  of  Proposition  5.3.  Since 
the  outer-join  represents  a  completion  of  /,  the  global  test  does  not  necessarily 
imply  a  positive  consistency  test  on  the  outer-join.  The  reason  is  that  the  global 
test  verifies  that  there  exists  an  instance  (completion)  of  I  where  the  FDs  are 
satisfied  -  this  instance  may  not  be  the  one  represented  by  the  outer-join.  The 
following  is  the  canonical  counter-example. 

Consider  the  relations  Ri=[X,A,B],  Rz=[Y,A,C],  R^—\X,Y,B,C],  the  depen¬ 
dencies  XA-*B,  YA^C,  and  the  instance: 

T\  T2 

Y  A  C 

2  1  2 

Figure  5.5 

The  global  test  will  be  positive  on  this  instance  eind  if  we  test  for  con¬ 
sistency  in  the  outer  join  ri®(r2®7’3)  we  will  also  get  a  positive  answer.  But,  if  we 
test  on  another  outer-join,  "we  will  have  a  negative  answer. 

The  following  result  has  been  proven  in  [Honeyman  80]. 

Proposition  5,5  [Honeyman  80] 

Let  p  =  (/?!,/? 2'  ■  ■  ■  >^k)  be  a  decomposition  of  R  that  has  the  lossless-join 
property.  For  any  weakly-consistent  instance  of  p,  say  (ri,r2,  •  ’  ’  .rfe),  the 

*  , 

natural  join  *  is  contained  in  the  minimally  incomplete  instance  /  .  ■ 
i=l 

The  above  proposition  gives  a  polynomial  algorithm  to  compute  the  natural 
join  of  a  set  of  relations.  Construct  the  representative  instance  /,  apply  the  NS- 
rules  and  check  each  element  of  the  resulting  instance  /*  for  possible  member¬ 
ship  in  the  natural  join;  for  each  tuple  f  in  /*  and  each  rj,  verify  that  f  [/?i]€ri. 
Tuples  that  do  not  satisfy  the  above  can  not  be  in  the  natural  join. 


X 

A 

B 

2 

1 

2 

7-3 


X 

Y 

B 

C 

2 

2 

1 

1 

j 
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As  a  corollary  of  propositions  5.3  through  5,5  we  present  the  connection 
between  our  weaker  consistency  test  and  the  consistency  test  which  is  based  on 
the  UR-as sumption. 

Theorem  5.1 

For  a  join-consistent  database  which  has  the  lossless-join  property,  the  test 
for  FD-satisfiability  on  the  natural  join  and  the  test  for  FD-satisfiability  on  the 
representative  instance  are  equivalent  (always  have  the  same  result). 

Proof 

The  outer-join  (exact  covering)  collapses  to  the  natural  join  for  join-consistent 
databases  since  all  tuples  can  be  joined.  From  proposition  5.3,  if  the  natural  join 
is  consistent  then  /  is  consistent.  Conversely,  if  /  is  consistent,  the  natural  join, 
which  is  contained  in  /*  from  proposition  5.5,  is  trivially  consistent.  ■ 

5.5.  Testing  for  Database  Consistency 

In  this  section  we  address  the  problem: 

Given  a  consistent  database,  will  it  remain  consistent  after  the  insertion  of  a 
tuple  in  a  relation? 

We  discuss  three  different  tests  that  may  be  performed  to  test  consistency 
after  an  insertion  and  we  also  present  their  computational  complexity  and  ade¬ 
quacy  for  testing  consistency.  In  the  following  section  we  will  characterize  those 
databases  for  which  each  test  can  be  applied.  We  note  that  deleting  a  tuple  from 
a  consistent  database,  never  results  in  leaving  the  database  inconsistent. 

The  first  test  for  database  consistency  is  the  global  test  as  discussed  in  sec¬ 
tion  5.4.  For  each  tuple  insertion,  the  representative  instanee  /  of  the  database 
(with  the  new  tuple)  is  first  generated.  Then  the  NS-rules  are  applied  on  I  as 
many  times  as  possible.  Finally  the  test  for  FD-satisfiability  is  made  on  the 
resulting  minimally  incomplete  instance  /*.  The  insertion  of  the  tuple  will  be 
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rejected  if  the  minimally'  incomplete  instance  is  not  consistent.  Even  though  the 
global  test  requires  polynomial  time,  it  is  still  too  expensive  for  any  practical 
use.  The  entire  database  has  to  be  looked  at  several  times. 

Alternatively,  for  some  databases  a  less  expensive  localized  te'&t  may  be 
performed.  To  illustrate  this  test,  assume  that  a  tuple  ti  is  to  be  inserted  in  a 
relation  of  a  consistent  database.  Make  "hypothetical”  insertions  of  tuples  tj 
in  all  other  relations  in  the  database.  A  tuple  tj  has  the  same  values  as 

ti  for  common  attributes,  and  nulls  for  the  attributes  which  are  not  in  Ri.  Each 
of  the  relations  is  tested  locedly  for  consistency  after  the  "hypothetical"  inser¬ 
tions.  If  all  relations  are  consistent,  the  tuple  t  is  inserted  in  r  and  all  other 
"hypothetical"  insertions  are  ignored. 

More  formally,  the  test  is  performed  in  two  stages.  After  the  hypothetical 
insertions  each  relation  in  the  database  has  at  most  one  tuple  with  null  values. 
In  addition,  some  of  these  null  vedues  represent  the  seune  attribute  values.  This 
happens  for  all  common  attributes.  Therefore,  we  invoke  some  null  equality 
constraints.  The  null  values  may  be  substituted  with  regular  values  when  the 
NS-rules  are  applied.  A  substitution  of  a  null  value  triggers  the  substitution  of 
all  other  nulls  in  the  same  equivalence  class.  It  is  noted  that  this  completion 
process  never  involves  an  NS-rule,  corresponding  to  the  dependency  AT where 
for  two  tuples  in  a  relation  both  Af-values  are  null.  Tliis  is  because  at  most  one 
tuple  can  have  nulls  in  any  relation. 

The  localized  test  is  performed  with  the  edgorithm  LOCALIZED —TEST  in 
figure  5.6.  In  completing  a  relation,  we  search  for  a  tuple  in  the  relation  which 
has  the  same  values  as  the  new  tuple  for  the  attributes  on  the  left-hand-side  of  a 
dependency.  This  is  done  for  each  dependency  in  one  pass  over  the  relation.  It 
can  be  done  with  the  algorithm  LOCALIZED  -COMPLETION  in  figure  5,7  in  time 
0{\F  \'7i).  The  algorithm  LOCALIZED —COMPLETION  uses  the  algorithm 
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Givan  a  consistent  databass,  this  algorithm  performs  tha  localizad 
tast  for  database  consistency  for  a  tuple  insertion 


LOCALIZED_TEST  ^parameters:  <T^,rz, 
returns(6ooiean ) 


>rk>.  <Fi.F2.  •  •  •  ,F),>.  ti) 


begin 


k 


F^U^} 

3  =  1 

call  LOCALIZED^COMPLETION  (<rx,r2,  ■  •  •  .r),>, 
for  each  relation  ,  m=  1 ,2,..., k 
begin 

if  TEST^NSERTION{rm.Frr,.tm)=no 
then  retum(no ) 
end 

return(yes ) 
end 


F. 


k) 


Complexity  Analysis 

The  algorithm  runs  in  time  0  {\F  \  ’n).  Localized  completion  requires  this 
much  time.  In  addition,  insertion  testing  is  made  for  all  relations  in  the  data¬ 
base,  thus  requiring  0  {\F  |'7i)  time.  In  the  special  case  where  each  relation  in 
the  database  has  only  one  key  dependency  and  is  sorted  on  the  key,  the  algo¬ 
rithm  requires  time  0{\F\  -logn),  where  |  is  the  number  of  database  relations 
(or  dependencies),  and  n  is  the  average  relation  size. 


Figure  5.6 
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Civen  a  consistent  database,  a  set  of  FDs  and  a  tuple  insertion  in  a 
database  relation,  this  algorithm  outputs  a  "minimally  incomplete 
state”  of  the  database 


LOCALIZED _ COMPLETION  (^oirejnstsTS’.  <r  i,r2f  *  '  '  F.  t^) 

begin 

comment;  'F  can  be  applied  for  f"  means:  there  exists 
a  dependency  in  and 
This  condition  is  tested  with  the  use  of  a  "flag". 

comment:  the  ^govithm  APPLY_NS_JiULE  {X  ->Y,i)  searches 
for  the  first  tuple  in  the  database  where  the  NS-rule 
is  applied  for  the  null  substitution 

comment;  the  tuple  t  is  the  universal  relation  tuple  corresponding 
to  (non-specified  attribute  values  are  tJ’s) 

while  F  can  be  applied  for  t  do 
begin 

flag  "no  FD  used  in  pass" 

for  each  dependency  Fin F  do 

begin  • 

if  t\X'\^''&  then 
begin 

if  L  [r]=T? 

then  APPLY_NS^ULE{X-^Y,t) 

F^F-\X^Y] 
flag  "FD  used" 

end 

end 

end 

end 


Complexity  Analysis 

The  running  time  of  this  algorithm  is  dominated  by  the  calls  of 
APPLY_JIS_RULE.  Each  call  requires  time  0  (n)  where  n  is  the  database  size. 
Since  the  call  cannot  be  made  more  than  once  for  each  FD,  the  algorithm  runs 
in  time  0(|F)-n).  The  algorithm  ahvays  terminates,  but  does  not  necessarily 
produce  a  unique  "minimally  incomplete  state".  The  resulting  state  depends  on 
the  order  the  FD  rules  are  applied.  If  there  is  an  inconsistency  it  will  be 
detected  in  any  instance. 


Figure  5.7 


Ill 


This  algorithm  applies  the  mill  substitution  rule 


APPLY.^S _ RULE  ( parameters:  X->Y,  t) 

begin 

for  each  “171  <r  i,r2,  ‘  •  ■  ,rjk>  where  X y  do 

begin 

if  there  exists  t^  with  then 

begin 

return 

end 

end 

end 


Complexity  Analysis 

In  the  worst  case,  the  algorithm  will  run  in  time  linear  on  the  size  of  the 
database.  This  happens  when  the  FD  is  embedded  in  all  relations  and  no 
tuple  is  found  in  any  of  them.  Assuming  that  a  dependency  is  embedded  in  only 
one  relation,  the  algorithm  requires  time  linear  on  the  size  of  the  relation  where 
the  FD  is  embedded.  Furthermore,  if  the  relation  is  sorted  on  X  the  time  needed 
may  be  0  {logn). 


Figure  5.8 
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Given  a  consistent  relation  r  and  a  tuple  t,  test  the  consistency 
of  the  relation  after  the  insertion  of  the  tuple 

TEST^HSERTION  (psjrmneteTsir,  F,  t)  retumsf  bo  ole  an) 

begin 

comment:  read_tuple(r)  reads  a  new  tuple  from  r  and  returns 
EOF  if  no  more  tuples 

ti*-read_tuple  (r ) 
while  t^TtEOF  do 
begin 

for  each  X^YinF  do 
begin 

if  ti[X]=t[X] 

then  if  [y'j 

then  retum(no ) 

end 

ti  *r-read_tuple  (r ) 

end 

return(yes  ) 
end 


Complexity  Analysis 

The  algorithm  runs  in  time  0{\F  \  'n),  where  n  is  the  size  of  the  relation  and 
is  the  number  of  FDs.  In  the  case  of  only  one  dependency  (e.g.  BCNF  with 
one  key  and  sorted  on  the  key),  a  simpler  algorithm  may  be  used.  The  simpler 
algorithm  looks  for  an  equal  key  value  and  takes  time  0{logn),  or  time  indepen¬ 
dent  of  the  relation  size  for  a  hashed  key. 


Figure  5.9 
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APPLY —NS  —RULE  in  figure  5.8.  Note  that  to  complete  a  relation  we  need  as 
many  passes  as  the  number  of  dependencies  \Fi\  and  each  pass  takes  time 
linear  on  the  size  7^i  of  the  relation.  Since  this  application  of  the  rules  has  to  be 
done  for  all  relations,  the  total  time  required  is  0  (n-  \F  \  ). 

In  the  second  stage;  we  test  for  FD-satisfiability.  This  can  be  done  with  the 
algorithm  TEST  —INSERTION  in  figure  5.9  and  takes  time,  in  the  worst  case, 
linear  on  the  size  of  the  database.  Of  course,  if  we  only  have  key-dependencies 
(e.g.  BCNF  schemas)  and  keys  are  hashed,  then  testing  each  dependency  is  actu¬ 
ally  independent  of  the  relation  size  and  requires  one  access  only.  We  illustrate 
the  localized  test  with  an  example. 

Let  p  =  {  <R\yF \>,  <R2.F2>,  <Rz,F2>  )  be  a  database  scheme  where; 
/?!=  R2=  [B.C],  /?3=  lAyC],  and  Fi=  \A-^B],  F2=  {B-^C],  F^^  {A-^Cl 

Consider  an  instance  (ri.rg.re)  of  p. 

rz 

B  C 

1  1 


A 

B 

1 

1 

Figure  5.10.1 


Suppose  the  tuple  (1,2)  is  to  be  inserted  in  rs.  We  make  hypothetical  inser¬ 
tions  in  r  1  and  rg  to  reach  the  state: 

r'\  r’2  r’a 


B 

C 

1 

1 

■d 

2 

A 

B 

1 

1 

1 

A 

C 

1 

2 

Figure  5.10.2 


The  algorithm  LOCALIZED  —COMPLETION  is  now  used.  Notice  that  only  the 
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NS-rule  corresponding  to  the  FD  A  -*B  can  be  applied.  The  t?  in  the  second  tuple 
of  r'l  becomes  1  and  this  triggers  the  substitution  of  the  in  r'g  with  the  same 
value  1  since  they  belong  to  the  same  equivalence  class.  The  resulting  minimally 
incomplete  instance  is: 


r  1 


A 

B 

1 

1 

1 

1 

T  2 


Figure  5.10.3 


The  application  of  the  algorithm  TEST -INSERTION  ’will  show  that  B-^C  is 
violated  on  r" z.  Therefore  the  original  insertion  is  not  allowed. 

The  localized  test  is  not  correct  for  all  databases.  We  illustrate  this  fact 
with  a  modification  of  the  previous  example.  Consider  again  the  same  relation 
schema  p  and  the  instance  (ri,r2,r3)  with  the  difference  that  E i=4>.  This  time 
the  tuple  (1,2)  will  pass  the  localized  test  and  win  be  inserted  in  the  database. 
Notice  that  no  NS-rule  can  be  applied  on  {t' i.r' z,t' 3)  and  all  dependencies  are 
weakly  satisfied.  On  the  other  hand,  if  the  global  test  is  performed,  the 
representative  instance  /  of  (ri,Tz,rQ[j{{l,  2)])  is: 


I 


A 

B 

C 

1 

1 

1 

1 

1 

1? 

2 

Figure  5.10.4 

Applying  the  NS-rule  A  we  have  the  null  in  the  first  tuple  equal  to  2.  This 
violates  the  dependency  5 ->C  (first  and  second  tuples).  The  resulting  database 
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state  is  inconsistent  but  the  localized  test  can  not  detect  the  inconsistency. 

The  global  test  can  be  thought  of  as  proceeding  exactly  as  the  localized  test 
with  the  difference  that  when  the  relations  are  completed  and  found  consistent, 
all  tuples,  even  the  one’s  with  null  values  are  retained  in  the  database. 

Our  final  test  for  consistency  is  called  independent.  As  the  name  suggests, 
the  consistency  test  for  the  insertion  of  tuple  t  in  relation  r  is  made  only  on 
independently  of  the  other  relations  in  the  database.  This  test  requires 
the  least  computational  effort.  Algorithm  TEST  —INSERTION  in  figure  5.9  is 
used  directly. 

Proposition  5.6 

In  a  given  database  (ri.rg,  '  '  the  following  holds: 

Global  test  =>  Localized  test  =>  Independent  test 

Assuming  that  we  ask  the  question:  "Is  the  database  consistent  after  an  inser¬ 
tion?",  then  "=>"  means:  A  yes  answer  from  the  test  on  the  left  implies  a  yes 
answer  from  the  test  on  the  right. 

Proof  (trivial) 

All  tests  can  be  seen  as  performed  on  the  representative  universal  relation 
instance,  except  that  the  localized  and  independent  tests  are  made  on  smaller 
subsets  of  the  instance.  ■ 

What  we  are  interested  in  is  finding  those  cases  where  each  test  is  sufficient 
for  testing  database  consistency.  Thus,  we  aim  at  characterizing  schemas 
where: 

Independent  test  =>  Localized  test  =>  Global  test 
or 


Localized  test  =>  Global  test 
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5.6.  Tree-Structured  Databases 

In  this  section  we  characterize  database  schemas  for  which  the  easier  con¬ 
sistency  tests  can  be  made  correctly.  First,  we  give  some  definitions.  Let  p  be 
a  multi-relation  database  schema  which  embeds  a  cover  of  F.  An  FD  X-^Y  is 
called  intra— relational  if  for  some  Ri  in  p.  Similarly,  an  FD  A'-^Fis  called 

inter  -relational  if  for  no  Ri  in  p,  XY'QRf.  In  a  schema  p  which  embeds  a  cover  Z’, 
if  is  an  inter-relational  FD  then  there  exist  two  intra-relational  FDs  X^W 

and  WZ  ->F  in  F*.  For  A” -^F  to  be  violated,  one  of  the  intra-relational  FDs  must  be 
violated.  We  illustrate  with  an  example  that  intra-relational  FDs  that  involve 
common  attributes  may  be  locally  satisfied  but  globally  violated.  Consider  the 
two-relation  schema  p={<R  i,F i>,  <Rz,Fz>)  where  R  \=\A,B,D  R2—\A,B,C  F i=tf> 

and  F2=  \A  -*C,  B  -*C\.  Let  r  i,  r2  be  a  globally  consistent  state  of  p. 


T\ 


A 

B 

D 

1 

1 

1 

Tz 


A 

B 

C 

1 

2 

1 

Figure  5.11 

The  insertion  of  the  tuple  (2,1,2)  in  rg  will  not  violate  any  FD  when  tested 
with  the  localized  test,  but  wiU  result  in  a  globally  inconsistent  state.  This  can 
be  easily  observed  in  the  representative  instance  I. 


A 

B 

C 

D 

1 

! 

1 

1 

2 

1 

-i? 

2 

1 

2 

Figure  5.12 
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The  'd  in  the  first  tuple  of  /  must  be  1  because  ot  A -*C  and  2  because  of 
B^C. 


Lenuna  5.1 

Let  p  =  </?2,Fa>)  be  a  two-relation  schema  which  is  dependency 

preserving.  Intra-relational  dependencies  on  instances  of  p  are  never  violated 
globally  unless  either: 

1. -  they  are  violated  locally; 

2. -  X=R  ir\R2^i^  and  there  exist  proper  subsets  W,  Z  of  X  such  that,  for  an  attri¬ 
bute  yi  not  in  X,  we  have  A  ^.W^,  AcZ"*".  W/tZ'*',  and  Z^W*". 

Proof 

The  first  part  of  the  proof  is  trivial.  If  one  of  the  two  conditions  in  the  lemma  is 
met.  then  the  instance  of  p  may  not  be  globally  consistent.  The  previous  exam¬ 
ple  is  an  Illustration  of  this  faot.  For  the  other  direotion,  we  will  show  that  if  an 
Instance  of  p  is  not  consistent  globally  but  is  locally  consistent,  then  condition  2 
in  the  lemma  is  met.  Assume  that  (ri.ra)  is  locally  consistent  but  not  globally 
consistent.  Therefore,  there  are  two  tuples  the  minimally  incomplete 

instance  of  R=R  i\jR 2  (after  the  application  of  the  NS-rules  on  I)  and  an  FD  W ^A 
such  that  but  ti\A\i^t^A^.  We  may  further  assume  without  loss  of 

generality  that  WAQR^  or  WAQR2  since  all  inter-relational  constraints  can  be 
recovered  as  two  intra-relational  constraints.  It  must  be  \Jnat  A^X=Rif\R2-  If 
A  was  in  X,  then  W ^A  would  have  been  locally  violated  since  all  common  attri¬ 
bute  values  have  no  nulls.  We  show  that  W QX.  Suppose  otherwise.  If  W  (^X  —  ^ 
then  W -^A  would  have  been  violated  locally.  If  W  C\X-=i^^  but  W ^X  then  we  would 
have  had  in  the  minimally  incomplete  instance  the  tuples  t\,  fg  having  nulls  for 
some  of  their  common  attributes.  Therefore,  the  FD  W -^A  can  not  be  violated  on 
the  two  tuples.  We  now  show  that  ti  and  iri  the  minimally  incomplete  instance 
come  from  different  relations.  This  is  because  it  must  be  that  one  of  the 
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ii[i4],  was  null  in  the  representative  instance  /  (before  the  rules  were 

applied).  If  none  was  null,  then  the  inconsistency  would  have  been  detected  with 
the  localized  test.  Let  ty  be  the  tuple  from  r\.  Since  ti[A]  was  initially  null, 
there  must  be  an  FD  Z  -►i4  and  an  NS-rule  corresponding  to  this  FD  that  changed 
the  null  to  a  constant.  In  addition,  for  the  rule  to  be  applied,  there  must  be  a 
tuple  f  3  in  the  minimally  incomplete  instance,  such  that  talZ ]=f  i[2’  ]  and  t^\A  ]  a 
constant.  Note  that  ^2*  ^3  come  from  r2  and  t^ZW^^t^ZW\  Since  f3[2']  has  all 
constants,  all  attributes  in  Z  are  common.  Hence,  Z  QX.  It  can  be  readily  seen 
that  W ^Z'*'  since  if  there  was  an  FD  Z -^W  xn  F'*‘,  its  violation  would  have  been 
detected  locally  (common  attributes).  Similarly,  Z  ,  From 

WQX,  ZQX,  ZgW^.  W^Z^  it  is  implied  that  and  ZcX,  • 


Lenima  5.2 

Both  conditions  in  lemma  5.1  can  be  tested  in  polynomial  time. 

Proof 

The  determination  of  whether  a  schema  p  embeds  a  cover  can  be  made  in  poly¬ 


nomial  time  [Honeyman  80a].  Local  violation  of  FDs  can  also  be  detected  in 
polynomial  time.  For  the  second  condition,  we  have  to  calculate  the  closure  of 
all  attributes  which  are  not  common  (i.e.  A*.  H  A^X').  Then,  for  each  set  in  the 
closure  which  is  also  a  subset  of  the  common  attribute  set.  we  calculate  its  clo¬ 
sure  and  we  test  whether  there  exists  a  pair  of  such  subsets  where  one  is 
included  in  the  closure  of  the  other.  Calculating  attribute  closures  requires 
polynomial  time.  ■ 


We  say  that  a  dependency  Y -^Z  in  F'^  is  shared  if  YZQX=R],(^Rz’  Let  F'*'\x 
denote  the  set  of  all  FDs  in  F^  projected  on  X  (i.e.  the  FDs  that  are  shared  by  R  i 


and  Rz). 


Lemma  5.3 
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The  test  F*\x  —  0  can  be  made  in  polynomial  time. 

Propf 

If  the  FD  Y-^AeF*\x  with  A^Y  and  YQX,  then  X-^AQF-^\x.  In  addition. 
{X -\A  ^F\  (since  Y ])).  We  may  now  use  the  following  algorithm  for 

the  test.  The  algorithm  returns  yes  when  F'^\x  =  0. 


begin 

for  each  A  cX 

calculate  {X—\A 
if  Ae{X-\A])* 

then  return  (no) 

end 

return  (yes) 

end 


Lemma  5.4 

Let  p  be  a  two-relation  schema.  If  the  following  conditions  are  met: 

1. "  p  is  dependency  preserving; 

2. - there  are  no  FDs  W ^A,  Z^A  in  F'^  such  that  WZQX=R\f\R2,  A^X,  W F^Z^ ,  and 

ZF^^. 

Then,  localized  test  —>  global  test,  for  ail  instances  of  p. 

If,  furthermore, 

3. -  There  are  no  shared  FDs, 

then,  independent  test  =>  global  test,  for  all  instances  of  p. 

Proof  (trivial  from  lemma  5.1).  ■ 

Two-relation  database  schemas  are  special  cases  of  a  family  of  schemas 
which  we  call  tree-structured  schemas.  These  schemas  have  the  characteristic 
that  no  attribute  is  common  to  more  than  two  relations.  If  schemas  are 
represented  by  graphs  where  relations  are  the  nodes  and  an  edge  exists 
between  two  nodes  if  they  share  an  attribute,  then  the  graph  of  these  schemas  is 
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a  tree. 

Theorem  5.2 

Let  p  be  a  tree -structured  database  schema.  If  conditions  1  and  2  in 
lemma  5.4  Eire  met  for  each  pair  of  relations  in  p,  then  localized  test  =>  global 
test,  for  all  instances  of  p.  If  condition  3  is  also  met,  then  independent  test  => 
global  test. 

Proof  (easy  generalization  of  lemma  5.4).  ■ 

With  lemmas  5.2  and  5.3  we  showed  that  edl  assumptions  we  make  about  the 
schemas  are  reasonable  in  that  they  can  be  tested  in  polynomial  time.  As  an 
example  of  tree-structured  schemas  where  the  independent  test  is  correct,  we 
present  the  familicir  example  from  [Codd  72]. 

EMPLOYEE(empio'yee#,  department^) 

DEPARTMENT(deparime7iij^,  manager^,  contract -type) 

with  employee^  -^departments  and  department^  -*manager§,  contract  —  type. 

Changes  in  each  relation  may  be  performed  independently.  Thus,  when  we 
insert  in  an  instance  of  the  EMPLOYEE  scheme,  we  only  have  to  test  for  the 
satisfiability  of  the  FD  employees  -^departments-  There  are  no  modification 
anomalies  (in  the  original  sense  of  Codd)  since  we  have  no  side-effects  in 
modifications;  no  attempt  is  made  to  preserve  artificially  CIP.  Furthermore,  the 
global  consistency  requirements  under  the  (weak)  universal  relation  model  are 
guaranteed. 


PART  IV 


CONCLUSION 


Lest  it  has  been  lost,  let  us  restate  the  principal  aim  of  this  thesis:  to  state 
clearly  and  precisely  the  problems  of  treating  imperfection  in  a  database  and  to 
provide  solutions  for  some  of  them.  In  this  last  part  of  the  thesis,  consisting  of  a 
single  chapter,  we  summarize  our  achievements  towards  that  goal,  discuss  the 
importance  of  our  results,  and  identify  some  important  new  research  areas  that 
our  work  opens. 


Chapter  6 


Concluding  Remarks 


There  is  nothing  new  except  what  has  been  forgotten 

-  Marie  Antoinette 


The  treatment  of  imperfection  in  a  database  is  a  "real”  problem,  an  impor¬ 
tant  one,  and  also  a  hard  one.  We  do  not  claim  that  by  providing  a  formal  treat¬ 
ment  of  null  values  in  a  relational  database  we  have  solved  the  problem  in  a 
universally  acceptable  manner.  Our  claim  is  that  we  have  made  a  big  step 
towards  a  solution  by  presenting  an  approach  which: 

-  is  both  intuitively  appealing  and  mathematically  elegant  and  precise; 

-  makes  the  practical  problems  of  imperfection  explicit  so  that  we  are  able  to 
provide  solutions  for  some  of  them. 

Perhaps  the  main  achievement  of  any  mathematization  is,  by  selecting 
some  specific  formal  structure,  to  render  the  statement  of  the  practical  prob¬ 
lems  more  precise  and  the  search  for  their  solution  more  specific.  Our  formal 
structure  is  Scott’s  mathematical  semantics.  The  embodiment  of  some  forms  of 
imperfection,  namely  the  missing  and  inconsistent  null  values,  in  this  frame¬ 
work  is  straightforward  and  natural.  The  meaning  of  these  nulls  and  their  rela¬ 
tionship  with  the  other  regular  values  becomes  clear.  Of  course  there  are  other 
forms  of  imperfection  in  a  database.  We  do  not  believe  that  their  treatment  is 
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possible  in  our  framework.  This  is  due  to  the  lattice  structure  and  the  special 
positions  of  the  two  treated  nulls  in  the  lattices.  On  the  other  hand,  we  argue 
that  these  may  be  the  most  important  and  commonly  occuring  forms  of  imper¬ 
fection.  In  particular,  the  missing  null  can  be  considered  as  an  approximation 
to  all  other  forms  of  imperfection. 

In  the  first  chapter  we  stated  the  problem,  its  importance  in  database 
management,  and  the  rationale  for  its  solution.  The  second  chapter  set  the 
theoretical  basis  and  context.  In  chapter  3  we  considered  queries  and  how  they 
are  extended  in  the  presence  of  imperfection.  The  choice  of  the  extension  rule 
we  adopted,  the  least  extension  rule  in  Scott’s  terms,  was  justified  both  semanti¬ 
cally  and  syntactically.  It  was  shown  how  the  problem  of  tautological  queries  in 
approaches  based  on  truth-functional,  many-valued  logics  was  avoided.  We  con¬ 
centrated  on  a  sophisticated  treatment  of  the  missing  null.  Still,  a  better  treat¬ 
ment  of  the  inconsistent  null  is  feasible  within  our  framework,  at  the  expense  of 
increased  complexity.  In  particular,  an  alternative  extension  rule  may  be 
adopted,  one  that  uses  greatest  lower  bounds  in  query  evaluation  on  incon¬ 
sistent  null  values.  This  demonstrates  the  flexibility  of  our  framework. 

While  the  understanding  of  the  problem  and  the  proposal  of  a  general 
theoretical  solution  may  be  interesting,  the  issue  of  "practicality"  is  much  more 
important.  Our  proposed  general  solution  is  not  implementable;  it  requires  high 
computational  complexity  for  query  evaluation.  Having  this  in  mind,  we  also 
proposed  an  alternative  but  equivalent  algorithm  for  query  evaluation.  It  was 
shown  that  this  algorithm  requires  considerably  less  computational  effort.  Still, 
even  though  the  performance  would  be  acceptable  for  typical  database  queries, 
there  are  cases  (i.e.  queries  with  a  large  number  of  terms)  where  the  exponen¬ 
tial  factor  of  our  query  evaluation  algorithm  would  be  unmanageable.  Our 
search  for  a  substantially  better  algorithm,  in  terms  of  performance,  was 
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shown  to  have  httle  hope  for  success,  since  the  query  evaluation  problem  is  co- 
NP  complete.  Of  course,  there  may  be  other  versions  of  our  algorithm  that  work 
better  in  practice.  An  immediate  extension  to  our  work  is  the  development  of 
such  algorithms  and  their  average  case  analysis. 

In  this  thesis  we  chose  to  use  the  predicate  calculus-based  class  of  rela¬ 
tional  query  languages.  Another  extension  to  our  work  is  the  consideration  of 
the  relational  algebra  operators  for  relations  with  nulls.  For  some  operators 
(e.g.  projection)  the  extension  is  straightforward  and  was  implicit  in  our  work  in 
chapter  5.  The  operators  that  introduce  more  difficulties  are  join  and 
difference.  This  is  because  they  both  require  the  generalized  closed  world 
assumption.  For  instance,  consider  the  join.  We  say  that  two  tuples  are  joined  if 
some  attribute  values  "match”.  But  a  missing  null  can  conceivably  match  any 
value  appearing  in  a  tuple  of  the  other  relation  or  no  value  at  all.  If  we  adopt  the 
former  case,  then  a  tuple  with  a  null  joins  with  all  tuples  in  the  other  relation. 
Notice  though  that  there  is  an  exclusion  among  the  tuples  in  the  join.  Since 
missing  represents  exactly  one  value,  only  one  tuple  is  actually  in  the  join.  In 
[Codd  79]  and  [Biskup  80]  the  idea  of  maybe-tuples  having  a  status  code  associ¬ 
ated  with  them  is  introduced  to  resolve  the  difficulties. 

In  chapter  3  we  also  discussed  modification  operations.  We  presented  the 
semantics  of  modifications  by  giving  some  of  their  properties.  Still,  our  treat¬ 
ment  is  at  a  high  level  of  detail  and  more  work  is  needed  in  this  area.  In  particu¬ 
lar,  we  believe  that  for  any  research  effort  to  provide  a  significant  contribution 
to  the  area,  it  should  take  under  consideration  the  user’s  intention  with  respect 
to  the  operations.  This  may  be  a  philosophical  problem. 

In  addition  to  queries  and  modifications  there  are  other  concepts  of 
interest  for  the  management  of  a  database.  Semantic  statements  about  the 
portion  of  the  real  world  which  we  model  and  their  enforcement  as  constraints 
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on  database  instances  are  such  concepts.  Functional  dependencies  are  the  best 
understood  semantic  statements,  and  consequently,  most  of  the  work  in  the 
area  has  concentrated  on  them.  In  chapter  4  we  present  the  extension  of  FD 
interpretation  using  the  least  extension  rule. 

In  enforcing  the  FDs  as  integrity  constraints  we  distinguish  between  a 
strong  and  a  weaker  notion  of  FD  satisfiability.  There  is  no  fast  method  known 
for  testing  FD  satisfiability  in  real  databases,  even  when  no  nulls  are  present. 
Assuming  that  there  are.  FDs  which  are  not  key-dependencies,  their  satisfiability 
test  would  require  time  polynomial  on  the  number  of  tuples  in  the  instance.  The 
best  algorithm  known  is  dominated  by  an  7^•log7^  factor;  the  time  to  sort  the 
tuples.  We  showed  that  testing  satisfiability  in  relations  with  nulls  is  still  dom¬ 
inated  by  the  same  factor.  Thus,  the  presence  of  nulls  does  not  introduce  addi¬ 
tional  difficulties  in  enforcing  dependencies  as  constraints;  it  actually  allows  for 
more  valid  relation  instances. 

Rules  for  determination  of  'Tegal”  values  that  a  null  can  be  substituted  for 
in  the  presence  of  FDs  were  presented.  We  also  presented  inference  rules  for 
FDs  which  were  shown  sound  and  complete  directly  for  strong  satisfiability,  and 
under  plausible  assumptions,  for  weak  satisfiability.  An  easy  extension  of  our 
work  is  the  consideration  of  other  dependencies,  e.g.  multivalued  dependencies, 
in  the  presence  of  imperfection.  One  of  the  reasons  we  do  not  discuss  them  is 
their  questionable  usefulness  for  practical  database  management  applications 
[Sciore  80],  [Korth  and  Ullman  80]. 

Showing  that  the  traditional  theory  of  functional  dependencies  goes 
through,  even  when  nulls  are  present  and  only  with  small  modifications,  leads  us 
to  chapter  5.  We  believe  our  most  important  contribution  is  made  here.  The 
theory  of  relational  database  design  and  its  limitations  are  discussed  in  the 
introductory  sections  of  chapter  5.  Some  of  the  limitations  of  the  traditional 
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theory  are  due  to  its  inability  to  deal  with  imperfection.  To  circumvent  theoreti¬ 
cal  problems,  an  unacceptable  assumption,  the  universal  relation  instance 
assumption,  is  introduced.  The  assumption  was  showm  unacceptable  not  only 
semantically  but  also  because  in  the  process  of  patching-up  theoretical  gaps,  it 
creates  other,  more  serious,  problems.  The  main  reason  for  making  this 
assumption  is  the  inability  to  test  for  database  consistency  -without  it.  This  has 
a  direct  consequence  for  plausibly  defining  database  equivalence  which  is 
needed  in  database  design.  We  showed  how  the  database  consistency  test  can  be 
made  -without  the  UR-assumption  when  we  allow  for  nulls  and  we  also  demon¬ 
strated  that  all  problems  from  its  introduction  are  avoided. 

Testing  for  database  consistency  is  a  very  practical  problem.  Ideally,  we 
would  like  to  perform  such  a  test  for  every  database  change.  We  nov."  have  ways 
of  performing  the  test  in  polynomial  time.  This  is  in  contrast  to  the  exponential 
time  required  when  the  universal  relation  assumption  is  made  and  no  null  values 
are  used.  This  is  still  impractical.  Real  databases  are  very  large.  The  final  topic 
in  our  thesis  is  the  establishment  of  necessary  and  sufficient  conditions  for  data¬ 
base  schemas  where  the  consistency  test  on  their  instances  can  be  made  fast. 
Fast  in  this  context  m.eans  independent  of  the  database  size.  From  what  is 
currently  kno-wn,  to  do  this  type  of  test,  one  of  the  necessary  conditions  is  the 
presence  of  only  key-dependencies  (i.e.  BCNF  schemas).  When  an  index  is  kept 
for  the  key,  or  even  better  the  key  is  hashed,  all  the  mechanisms  for  tri-vial  FD 
satisfaction  are  present.  This  is  not  a  sufficient  condition  though  since  there  is 
also  a  need  to  look  at  some  inter-relational  dependencies  in  a  multi-relation 
database  for  global  consistency.  In  this  thesis,  we  showed  that  for  a  large  class 
of  database  schemas,  sufficient  conditions  can  be  found.  We  believe  it  is  a  good 
sign  that  these  are  the  schemas  which,  intuitively,  database  practitioners  always 
considered  good.  The  general  case  is  an  open  problem. 
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Appendix 


Axic»ms  in  C 


We  use  abbreviations  of  the  axioms  using  the  definitions  of  "implication" 
(=>)  and  "disjunction"  (V/). 

p\/Q  ;= 

P=>Q  :=  -^PX/Q 

The  symbol  "[^"  indicates  "derivation"  from  a  set  of  rules  (empty  set  of 
rules  for  the  axioms)  and  the  symbol  "|=2”  indicates  a  two-valued  logic  tautol¬ 
ogy- 

Cl.-  P->{Q=>P) 

C2-  H  (P  =  >(Q=>R)=>((P=>Q)=>(F=>B)) 

C3.-  h  (-P  =  >-‘Q)=>(P=>Q) 

C 4.-  h~  h"-P  =  >Q,  then  p- Q 

C5.-  ^VP=>P 

C6.-  ^  V(P=>Q)=>(VP  =  >VQ) 

C7.-  1 - .VP=>7^VP 

C8.—  y^Pf  then  f--  '^P 

C9.-  ifnof  \=2  {P"^Q).  then  V{P\/Q)=>{7P\^Q) 
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P.I.P.  Boulton,  November  1977 

CSRG-88  USING  A  GRAMMATICAL  FORMALISM  AS  A  PROGRAMMING  LANGUAGE 
Brad  A.  Silverberg,  January  1978 
[M.Sc.  Thesis,  DCS,  1978] 

CSRG-89  ON  THE  IMPLEMENTATION  OF  RELATIONS:  A  KEY  TO  EFFICIENCY 
Joachim  W.  Schmidt,  January  1978 

CSRG-90  DATA  BASE  MANAGEMENT  SYSTEM  USER  PERFORMANCE 
Frederick  H.  Lochovsky,  April  1978 
[Ph.D.  Thesis,  DCS.  1978] 

CSRG-91  SPECIFICATION  AND  VERIFICATION  OF  DATA  BASE 
SEMANTIC  INTEGRITY 
Michael  Lawrence  Brodie,  April  1978 
[Ph.D.  Thesis.  DCS,  1978] 

CSRG-92  STRUCTURED  SOUND  SYNTHESIS  PROJECT  (SSSP): 

AN  INTRODUCTION 

by  William  Buxton,  Guy  Fedorkow,  with  Ronald  Baecker, 

Gustav  Ciamaga,  Leslie  Mezei  and  K.C.  Smith,  June  1978 

CSRG-93  A  DEVICE-INDEPENDENT. GENERAL-PURPOSE  GRAPHICS  SYSTEM 
IN  A  MINICOMPUTER  TIME-SHARING  ENVIRONMENT 
William  T.  Reeves,  August  1970 
[M.Sc.  Thesis,  DCS,  1976] 

CSRG-94  ON  THE  AXIOMATIC  VERIFICATION  OF 
CONCURRENT  ALGORITHMS 
Christian  Lengauer,  August  1970 
[M.Sc.  Thesis.  DCS,  1978] 

CSRG-95  PISA:  A  PROGRAMMING  SYSTEM  FOR  INTERACTIVE 
PRODUCTION  OF  APPLICATION  SOFTWARE 
Rudolf  Marty,  August  1970 

CSRG-96  ADAPTIVE  MICROPROGRAMMING  AND  PROCESSOR  MODELING 
Walter  G.  Rosocha 
[Ph.D.  Thesis.  EE,  August  1978] 

CSRG-97  DESIGN  ISSUES  IN  THE  FOUNDATION  OF  A  COMPUTER-BASED 
TOOL  FOR  MUSIC  COMPOSITION 
William  Buxton 

[M.Sc.  Thesis,  CSRG,  October  1970] 
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CSRG-98  THEORY  OF  DATABASE  MAPPINGS 
Anthony  C.  King 

[Ph.D.  Thesis,  DCS,  December  1978] 

CSRG-99  HIERARCHICAL  COROUTINES:  A  MECHANISM  FOR  IMPROVED 
PROGRAM  STRUCTURE 
Leonard  I.  Vanek,  February  1979 

CSRG-100  TOPICS  IN  PERFORMANCE  EVALUATION 
G.  Scott  Graham  (ed.),  July  1979 

CSRG-101  A  PANACHE  OF  DBMS  IDEAS  II 

F.H.  Lochovsky  (ed.).  May  1979 

CSRG-102  A  SIMPLE  SET  THEORY  FOR  COMPUTING  SCIENCE 
Eric  C.R.  Hehner,  May  1979 

CSRG-103  THE  CENTRALIZED  ALGORITHM  IN  DISTRIBUTED  SYSTEMS 
Ernest  J.H.  Chang 
[Ph.D.  Thesis,  DCS.  July  1979] 

CSRG-104  ELIMINATING  THE  VARIABLE  FROM  DIJKSTRA’S 
MINI-LANGUAGE 
D.  Hugh  Redelmeier,  July  1979 

CSRG-105  A  LANGUAGE  FACILIIY'  FOR  DESIGNING  INTERACTIVE 
DATABASE-INTENSIVE  APPLICATIONS 
John  Mylopoulos,  Philip  A.  Bernstein,  Harry  K.T.  Wong, 
July  1979 

CSRG-106  ON  APPROXIMATE  SOLUTION  TECHNIQUES  FOR 

QUEUEING  NETWORK  MODELS  OF  COMPUTER  SYSTEMS 
Satish  Kumar  Tripathi,  July  1979 

CSRG-107  A  FRAMEWORK  FOR  VISUAL  MOTION  UNDERSTANDING 
John  K.  Tsotsos,  John  Mylopoulos.  H.  Dominic  Cowey 
Steven  W.  Zucker,  DCS,  June  1979 

CSRG-lOB  DIALOGUE  ORGANIZATION  AND  STRUCTURE  FOR 
INTERACTIVE  INFORMATION  SYSTEMS 
John  Leonard  Barron 
[M.Sc.  Thesis,  DCS,  1980] 

CSRG-109  A  UNIFYING  MODEL  OF  PHYSICAL  DATABASES 
D.S.  Batory,  C.C.  Gotlieb,  April  1900 

CSRG-110  OPTIMAL  FILE  DESIGNS  AND  REORGANIZATION  POINTS 
D.S.  Batory,  April  1980 

CSRG-111  A  PANACHE  OF  DBMS  IDEAS  III 
D.  Tsichritzis  (ed.),  April  1900 
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CSRG-112  TOPICS  IN  PSN  -  U:  :EXCEPTIONAL  CONDITION 

HANDLING  IN  PSN;  REPRESENTING  PROGRAMS  IN  PSN; 
CONTENTS  IN  PSN 

Yves  Lesperance.  Byran  M.  Kramer,  Peter  F.  Schneider 
April,  1980  ■ 

CSRG-113  SYSTEM-ORIENTED  MACRO -SCHEDULING 
C.C.  Gotlieb.  andA.  Schonbach 
May  1980 

CSRG-114  A  FRAMEWORK  FOR  VISUAL  MOTION  UNDERSTANDING 
John  Konstantine  Tsotsos 
[Ph.D.  Thesis,  DCS,  June  1980] 

CSRG-1 15  SPECIFICATION  OF  CONCURRENT  EUCLID 
James  R.  Cordy  and  Richard  C.  Holt 
July  1980  : 

CSRG-116  THE  REPRESENTATION  OF  PROGRAMS  IN  THE 

PROCEDURAL  SI3iiANTlC  NETVYORK  FORMALISM 
Bryan  M.  Kramer 
[M.Sc.  Thesis,  DCS,  1980] 

CSRG-1 17  CONTEXT-FREE  GRAMMARS  AND  DERIVATION  TREES  AS  : 
PROGRAMMING  TOOLS 
Volker  Linnemann  i 
September.  1980 

CSRG-1 18  S/SL;  SYNTAX/SEMANTIC  LANGUAGE 
INTRODUCTION  AND  SPECIFICATION 
RC.  Hull,  J.R,  Curdy,  DJ3.  Wortman 
CSRG,  September  1980 

CSRG-1 19  PT:  A  PASCAL  SUBSET 
Alan  Rosselet 

[M.Sc.  Thesis,  DCS,  October  1980] 

CSRG-120  PTED:  A  STANDARD  PASCAL  TEXT  EDITOR  BASED  ON 
THE  KERNIGHAN  AND  PLAUCER  DESIGN 
Ken  Newman,  DCS 
October  1980 

CSRG-I21  TERMINAL  CONTEXT  GRAMMARS 
Howard  W.  Trickey 
[M.Sc.  Thesis,  EE,  September  1980] 

CSRG-122  THE  APPROXIMATE  SOLUTION  OF  LARGE  QUEUEING  r 
NETWORK  MODELS 
John  Zahorjan 

[Ph.D.  Thesis,  DCS,  August  1900]  :• 
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CSRG-123  A  FORMAL  TREATMENT  OF  IMPERFECT  INFORMATION 
IN  DATABASE  MANAGEMENT 
Yannis  VassiJiou 

[Ph.n.  Thesis.  DCS,  September  1980] 
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